[ 
https://issues.apache.org/jira/browse/PHOENIX-2599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15099150#comment-15099150
 ] 

James Taylor commented on PHOENIX-2599:
---------------------------------------

PhoenixRecordReader (which is used for MR, Spark, and Pig integration) does not 
deal with a StaleRegionBoundaryCacheException. This exception occurs if a split 
occurs while the job is running. In the standard JDBC code path, when a query 
is executed in Phoenix, we detect this exception and retry the batches that 
have failed. The reason this is necessary is for cases in which a merge sort is 
being done among the parallel scans we run. For example, if an ORDER BY is 
performed, the rows we get back from each scan are required to be sorted so 
that the merge sort correctly produces sorted rows. If a scan spans a region 
boundary (which is what this exception means), then the resulting rows we get 
back from the scan are from two separate region scanners and will not be 
completely ordered.

Since our MR, Spark, and Pig integration does not support any constructs that 
require merge sorting, the simplest fix would be to simply not do this check on 
the region server. This would require the following changes:
- In PhoenixRecordReader.initialize(), within the loop over the Scan objects, 
set a new attribute on the Scan (add a new constant to 
BaseScannerRegionObserver such as SKIP_REGION_BOUNDARY_CHECK).
- In BaseScannerRegionObserver.preScannerOpen(), conditionally don't call 
throwIfScanOutOfRegion() if this new attribute is set

I'm assuming that a new MR integration, I'm pretty sure the above approach is 
fine. For our Spark integration, I'm not positive. What's the lifecycle of the 
HConnection that's established to HBase with our integration, [~jmahonin]? Is 
it held open with multiple query invocations coming through it? If that's the 
case, then we may want to take a different solution, the reason being that this 
exception means that the HConnection on the client side (which is caching the 
region boundaries), is out of sync with the actual region boundaries. In this 
case, we'd want to clear this cache when we get this exception and then re-run 
the same scan again. Or an alternate would be to bypass the cache when we 
lookup the region boundaries (which requires another RPC).

Would you have cycles to provide a fix, [~jmahonin]?

> PhoenixRecordReader does not handle StaleRegionBoundaryCacheException
> ---------------------------------------------------------------------
>
>                 Key: PHOENIX-2599
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-2599
>             Project: Phoenix
>          Issue Type: Bug
>    Affects Versions: 4.5.1
>         Environment: HBase 1.0 + Linux
>            Reporter: Li Gao
>
> When running Spark 1.4.1 and Phoenix 4.5.1 via Phoenix-Spark connector. We 
> notice sometimes (30~50%) time the following error would appear and kill the 
> running spark job:
> 16/01/14 19:40:16 ERROR yarn.ApplicationMaster: User class threw exception: 
> org.apache.spark.SparkException: Job aborted due to stage failure: Task 5 in 
> stage 110.0 failed 4 times, most recent failure: Lost task 5.3 in stage 110.0 
> (TID 35526, datanode-123.somewhere): java.lang.RuntimeException: 
> org.apache.phoenix.schema.StaleRegionBoundaryCacheException: ERROR 1108 
> (XCL08): Cache of region boundaries are out of date.
> at com.google.common.base.Throwables.propagate(Throwables.java:156)
> at 
> org.apache.phoenix.mapreduce.PhoenixRecordReader.initialize(PhoenixRecordReader.java:126)
> at org.apache.spark.rdd.NewHadoopRDD$$anon$1.<init>(NewHadoopRDD.scala:133)
> at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:104)
> at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:66)
> at org.apache.phoenix.spark.PhoenixRDD.compute(PhoenixRDD.scala:52)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
> at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
> at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:63)
> at org.apache.spark.scheduler.Task.run(Task.scala:70)
> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.phoenix.schema.StaleRegionBoundaryCacheException: ERROR 
> 1108 (XCL08): Cache of region boundaries are out of date.
> at 
> org.apache.phoenix.exception.SQLExceptionCode$13.newException(SQLExceptionCode.java:304)
> at 
> org.apache.phoenix.exception.SQLExceptionInfo.buildException(SQLExceptionInfo.java:145)
> at 
> org.apache.phoenix.util.ServerUtil.parseRemoteException(ServerUtil.java:131)
> at 
> org.apache.phoenix.util.ServerUtil.parseServerExceptionOrNull(ServerUtil.java:115)
> at 
> org.apache.phoenix.util.ServerUtil.parseServerException(ServerUtil.java:104)
> at 
> org.apache.phoenix.iterate.TableResultIterator.getDelegate(TableResultIterator.java:70)
> at 
> org.apache.phoenix.iterate.TableResultIterator.<init>(TableResultIterator.java:88)
> at 
> org.apache.phoenix.iterate.TableResultIterator.<init>(TableResultIterator.java:79)
> at 
> org.apache.phoenix.mapreduce.PhoenixRecordReader.initialize(PhoenixRecordReader.java:111)
> ... 18 more



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to