Wellington Chevreuil created HBASE-23238:
--------------------------------------------

             Summary: Additional Test and checks for null references on 
ScannerCallableWithReplicas
                 Key: HBASE-23238
                 URL: https://issues.apache.org/jira/browse/HBASE-23238
             Project: HBase
          Issue Type: Improvement
    Affects Versions: 1.2.12
            Reporter: Wellington Chevreuil
            Assignee: Wellington Chevreuil


One of our customers running a 1.2 based version is facing NPE when scanning 
data from a MR job. It happens when the map task is finalising:
{noformat}
...
2019-09-10 14:17:22,238 INFO [main] org.apache.hadoop.mapred.MapTask: Ignoring 
exception during close for 
org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader@3a5b7d7e
java.lang.NullPointerException
        at 
org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.setClose(ScannerCallableWithReplicas.java:99)
        at 
org.apache.hadoop.hbase.client.ClientScanner.close(ClientScanner.java:730)
        at 
org.apache.hadoop.hbase.mapreduce.TableRecordReaderImpl.close(TableRecordReaderImpl.java:178)
        at 
org.apache.hadoop.hbase.mapreduce.TableRecordReader.close(TableRecordReader.java:89)
        at 
org.apache.hadoop.hbase.mapreduce.MultiTableInputFormatBase$1.close(MultiTableInputFormatBase.java:112)
        at 
org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.close(MapTask.java:529)
        at org.apache.hadoop.mapred.MapTask.closeQuietly(MapTask.java:2039)
...
2019-09-10 14:18:24,601 FATAL [IPC Server handler 5 on 35745] 
org.apache.hadoop.mapred.TaskAttemptListenerImpl: Task: 
attempt_1566832111959_6047_m_000000_3 - exited : java.lang.NullPointerException
        at 
org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.setClose(ScannerCallableWithReplicas.java:99)
        at 
org.apache.hadoop.hbase.client.ClientScanner.nextScanner(ClientScanner.java:264)
        at 
org.apache.hadoop.hbase.client.ClientScanner.possiblyNextScanner(ClientScanner.java:248)
        at 
org.apache.hadoop.hbase.client.ClientScanner.loadCache(ClientScanner.java:542)
        at 
org.apache.hadoop.hbase.client.ClientScanner.next(ClientScanner.java:371)
        at 
org.apache.hadoop.hbase.mapreduce.TableRecordReaderImpl.nextKeyValue(TableRecordReaderImpl.java:222)
        at 
org.apache.hadoop.hbase.mapreduce.TableRecordReader.nextKeyValue(TableRecordReader.java:147)
        at 
org.apache.hadoop.hbase.mapreduce.MultiTableInputFormatBase$1.nextKeyValue(MultiTableInputFormatBase.java:139)
...

{noformat}

After some investigation, we found out that 1.2 based deployments will 
consistently face this problem under the following conditions:
1) The sum of all KVs size targeted to be returned in the scan is larger than 
*max result size*;
2) At same time, the scan filter has exhaust, so that all remaining KVs should 
be filtered and not returned;

We could simulate this with the UT being proposed in this PR. When checking 
newer branches, though, I could verify this specific problem is not present on 
newer branches, I believe it was indirectly sorted by changes from HBASE-17489.

Nevertheless, I think it would still be useful to have this extra test and 
checks added as a safeguard measure.
 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to