Rushabh Shah created PHOENIX-7002:
-------------------------------------
Summary: Insufficient logging in phoenix client when server throws
StaleRegionBoundaryCacheException.
Key: PHOENIX-7002
URL: https://issues.apache.org/jira/browse/PHOENIX-7002
Project: Phoenix
Issue Type: Bug
Reporter: Rushabh Shah
Assignee: Rushabh Shah
Saw an incident in production cluster where phoenix returned result outside of
the range provided by the customer. There were hbck repair runs going on while
the query was running. During the start of the query, there were region holes
in the table (no way to confirm) and while the query was still running we ran
hbck repair operation and that caused region overlaps (This is confirmed since
overlap continued after the query).
But the sad part is there were absolutely no exceptions/errors/stack trace on
the client or server side.
After the query is run we log the execution time, number of exception
encountered as a log line. There we see this query encountered
[StaleRegionBoundaryCacheException|https://github.com/apache/phoenix/blob/4.16/phoenix-core/src/main/java/org/apache/phoenix/monitoring/MetricType.java#L57].
There is some logic in BaseResultIterators where we adjust the start and end
key range for the scan. See
[here|https://github.com/apache/phoenix/blob/4.16/phoenix-core/src/main/java/org/apache/phoenix/iterate/BaseResultIterators.java#L688-L730]
Without knowing the state of meta known or exception encountered, it is very
difficult to debug why this happened.
At the very least, we would want to log all the exceptions on the phoenix
client side.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)