[ 
https://issues.apache.org/jira/browse/PHOENIX-7893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Kyle Purtell updated PHOENIX-7893:
-----------------------------------------
    Description: 
{{LocalIndexIT}} {{testLocalIndexReverseScanShouldReturnAllRows}} and 
{{testLocalIndexUsedForUncoveredOrderBy}} can fail with a 
{{StackOverflowError}}.

The issue dates back to PHOENIX-4967 and PHOENIX-4964 when a reverse scan runs 
over a multiregion, pre-split local index. The query fails during 
{{executeQuery()}} with:

{noformat}
java.lang.StackOverflowError
    at 
org.apache.phoenix.iterate.BaseResultIterators.close(BaseResultIterators.java:1732)
    at 
org.apache.phoenix.iterate.BaseResultIterators.getIterators(BaseResultIterators.java:1635)
    at 
org.apache.phoenix.iterate.BaseResultIterators.recreateIterators(BaseResultIterators.java:1688)
    at 
org.apache.phoenix.iterate.BaseResultIterators.getIterators(BaseResultIterators.java:1584)
    at 
org.apache.phoenix.iterate.BaseResultIterators.recreateIterators(BaseResultIterators.java:1688)
    at 
org.apache.phoenix.iterate.BaseResultIterators.getIterators(BaseResultIterators.java:1584)
    ... (repeats thousands of times) ...
{noformat}

Test logs contain the same {{StaleRegionBoundaryCacheException}} more than 
10,000 times against the same region.

There are two related bugs. The first is a server-side boundary-check bug  that 
causes a permanent false-positive {{StaleRegionBoundaryCacheException}}. The 
second is an unbounded client-side retry that produces a {{StackOverflowError}} 
instead of a clean failure.

The trigger is a server-side check in {{BaseScannerRegionObserver.java}}:

{noformat}
if (isLocalIndex) {
  byte[] expectedUpperRegionKey =
      scan.getAttribute(EXPECTED_UPPER_REGION_KEY) == null
          ? scan.getStopRow()                       // <-- fallback used by ALL 
regular queries
          : scan.getAttribute(EXPECTED_UPPER_REGION_KEY);
  byte[] actualStartRow = scan.getAttribute(SCAN_ACTUAL_START_ROW);
  isStaleRegionBoundaries =
      (expectedUpperRegionKey != null
          && Bytes.compareTo(upperExclusiveRegionKey, expectedUpperRegionKey) 
!= 0)
      || (actualStartRow != null
          && Bytes.compareTo(actualStartRow, lowerInclusiveRegionKey) < 0);
}
{noformat}

When the client builds a local index scan in 
{{ScanUtil.setLocalIndexAttributes}},  {{SCAN_ACTUAL_START_ROW}} is set, but 
{{EXPECTED_UPPER_REGION_KEY}} is not. A repository wide search confirms 
{{EXPECTED_UPPER_REGION_KEY}} is only ever set in {{PhoenixInputFormat}}. The 
server always falls back to {{scan.getStopRow()}} for the regular query path. 
But for a reversed scan, {{startRow}} is the high bound and {{stopRow}} is the 
lower bound.

There is evidence of this problem in the test logs, e.g.:

{noformat}
Throwing StaleRegionBoundaryCacheException due to mismatched scan boundaries.
  Region: ...,o\x00...\x00,...
  lowerInclusiveScanKey:                       (empty  -> high end of reverse 
scan)
  upperExclusiveScanKey: o\x00\x00...\x00      (= scan.getStopRow(), the LOW 
bound)
  lowerInclusiveRegionKey: o\x00\x00...\x00
  upperExclusiveRegionKey:                     (empty  -> last region)
  scan reversed: true
{noformat}

{{expectedUpperRegionKey = scan.getStopRow() = o\x00…}} is wrong, this is the 
lower bound.

{{upperExclusiveRegionKey = ""}} is empty, the real upper boundary of the last 
region

so  {{isStaleRegionBoundaries = true}} but  the region boundaries are not 
actually stale.

  was:
{{LocalIndexIT}} {{testLocalIndexReverseScanShouldReturnAllRows}} and 
{{testLocalIndexUsedForUncoveredOrderBy}} can fail with a 
{{StackOverflowError}}.

The issue dates back to PHOENIX-4967 and PHOENIX-4964 when a reverse scan runs 
over a multiregion, pre-split local index. The query fails during 
{{executeQuery()}} with:

{noformat}
java.lang.StackOverflowError
    at 
org.apache.phoenix.iterate.BaseResultIterators.close(BaseResultIterators.java:1732)
    at 
org.apache.phoenix.iterate.BaseResultIterators.getIterators(BaseResultIterators.java:1635)
    at 
org.apache.phoenix.iterate.BaseResultIterators.recreateIterators(BaseResultIterators.java:1688)
    at 
org.apache.phoenix.iterate.BaseResultIterators.getIterators(BaseResultIterators.java:1584)
    at 
org.apache.phoenix.iterate.BaseResultIterators.recreateIterators(BaseResultIterators.java:1688)
    at 
org.apache.phoenix.iterate.BaseResultIterators.getIterators(BaseResultIterators.java:1584)
    ... (repeats thousands of times) ...
{noformat}

Test logs contain the same {{StaleRegionBoundaryCacheException}} more than 
10,000 times against the same region.

There are two related bugs. The first is a server-side boundary-check bug  that 
causes a permanent false-positive {{StaleRegionBoundaryCacheException}}. The 
second is an unbounded client-side retry that produces a {{StackOverflowError}} 
instead of a clean failure.


> Populate EXPECTED_UPPER_REGION_KEY on local index scans
> -------------------------------------------------------
>
>                 Key: PHOENIX-7893
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-7893
>             Project: Phoenix
>          Issue Type: Bug
>    Affects Versions: 5.4.0, 5.3.1
>            Reporter: Andrew Kyle Purtell
>            Assignee: Andrew Kyle Purtell
>            Priority: Major
>
> {{LocalIndexIT}} {{testLocalIndexReverseScanShouldReturnAllRows}} and 
> {{testLocalIndexUsedForUncoveredOrderBy}} can fail with a 
> {{StackOverflowError}}.
> The issue dates back to PHOENIX-4967 and PHOENIX-4964 when a reverse scan 
> runs over a multiregion, pre-split local index. The query fails during 
> {{executeQuery()}} with:
> {noformat}
> java.lang.StackOverflowError
>     at 
> org.apache.phoenix.iterate.BaseResultIterators.close(BaseResultIterators.java:1732)
>     at 
> org.apache.phoenix.iterate.BaseResultIterators.getIterators(BaseResultIterators.java:1635)
>     at 
> org.apache.phoenix.iterate.BaseResultIterators.recreateIterators(BaseResultIterators.java:1688)
>     at 
> org.apache.phoenix.iterate.BaseResultIterators.getIterators(BaseResultIterators.java:1584)
>     at 
> org.apache.phoenix.iterate.BaseResultIterators.recreateIterators(BaseResultIterators.java:1688)
>     at 
> org.apache.phoenix.iterate.BaseResultIterators.getIterators(BaseResultIterators.java:1584)
>     ... (repeats thousands of times) ...
> {noformat}
> Test logs contain the same {{StaleRegionBoundaryCacheException}} more than 
> 10,000 times against the same region.
> There are two related bugs. The first is a server-side boundary-check bug  
> that causes a permanent false-positive {{StaleRegionBoundaryCacheException}}. 
> The second is an unbounded client-side retry that produces a 
> {{StackOverflowError}} instead of a clean failure.
> The trigger is a server-side check in {{BaseScannerRegionObserver.java}}:
> {noformat}
> if (isLocalIndex) {
>   byte[] expectedUpperRegionKey =
>       scan.getAttribute(EXPECTED_UPPER_REGION_KEY) == null
>           ? scan.getStopRow()                       // <-- fallback used by 
> ALL regular queries
>           : scan.getAttribute(EXPECTED_UPPER_REGION_KEY);
>   byte[] actualStartRow = scan.getAttribute(SCAN_ACTUAL_START_ROW);
>   isStaleRegionBoundaries =
>       (expectedUpperRegionKey != null
>           && Bytes.compareTo(upperExclusiveRegionKey, expectedUpperRegionKey) 
> != 0)
>       || (actualStartRow != null
>           && Bytes.compareTo(actualStartRow, lowerInclusiveRegionKey) < 0);
> }
> {noformat}
> When the client builds a local index scan in 
> {{ScanUtil.setLocalIndexAttributes}},  {{SCAN_ACTUAL_START_ROW}} is set, but 
> {{EXPECTED_UPPER_REGION_KEY}} is not. A repository wide search confirms 
> {{EXPECTED_UPPER_REGION_KEY}} is only ever set in {{PhoenixInputFormat}}. The 
> server always falls back to {{scan.getStopRow()}} for the regular query path. 
> But for a reversed scan, {{startRow}} is the high bound and {{stopRow}} is 
> the lower bound.
> There is evidence of this problem in the test logs, e.g.:
> {noformat}
> Throwing StaleRegionBoundaryCacheException due to mismatched scan boundaries.
>   Region: ...,o\x00...\x00,...
>   lowerInclusiveScanKey:                       (empty  -> high end of reverse 
> scan)
>   upperExclusiveScanKey: o\x00\x00...\x00      (= scan.getStopRow(), the LOW 
> bound)
>   lowerInclusiveRegionKey: o\x00\x00...\x00
>   upperExclusiveRegionKey:                     (empty  -> last region)
>   scan reversed: true
> {noformat}
> {{expectedUpperRegionKey = scan.getStopRow() = o\x00…}} is wrong, this is the 
> lower bound.
> {{upperExclusiveRegionKey = ""}} is empty, the real upper boundary of the 
> last region
> so  {{isStaleRegionBoundaries = true}} but  the region boundaries are not 
> actually stale.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to