[
https://issues.apache.org/jira/browse/HBASE-8505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13663442#comment-13663442
]
Enis Soztutar commented on HBASE-8505:
--------------------------------------
>From the logs at
>https://builds.apache.org/job/HBase-0.94-security/ws/trunk/target/surefire-reports/org.apache.hadoop.hbase.client.TestMetaScanner-output.txt,
> I think I understand what is going on:
BlockingMetaScannerVisitor blocks and wait for the split daughter to appear
when it sees a parent region (HBASE-5986). CatalogJanitor on the other hand
will order the regions in a (kind-of) topological sort (based on parent child
relation) so that it will guarantee parents are not GC'd before daughters.
What is happening in this issue is not related to the patch in this jira, but
the test triggers this extremely rare case by running concurrent
catalogjanitor, splits and metascanners. We have parent, splita and splitb
regions, and catalogjanitor decides to delete parent first and splitb in one
run. While there is a concurrent metascanner which will go over the parent, and
sees that it is split, but before being able to read the split daughter,
catalog janitor will delete both the parent and the child, which will lead to
metascanner blocking until timeout and failing the test.
On solution might be to also check whether the parent is still there in
BlockingMetaScannerVisitor while we are blocking for the daughter.
Good thing is that with HBASE-7721, we don't need any of this in trunk.
> References to split daughters should not be deleted separately from parent
> META entry
> -------------------------------------------------------------------------------------
>
> Key: HBASE-8505
> URL: https://issues.apache.org/jira/browse/HBASE-8505
> Project: HBase
> Issue Type: Bug
> Components: regionserver
> Reporter: Enis Soztutar
> Assignee: Enis Soztutar
> Fix For: 0.98.0, 0.94.8, 0.95.1
>
> Attachments: hbase-8505_v1-0.94.patch, hbase-8505_v2-0.94.patch,
> hbase-8505_v2-0.94-reduced.patch, hbase-8505_v2.patch, hbase-8505_v2.patch
>
>
> In CatalogJanitor, we clean up the parent regions whose daughters does not
> have any more references to their parent regions. In doing so, we do two
> Delete's one for removing the split daughter columns, and the other for
> removing the row.
> The first one seems unnecessary, and causes NPE from concurrent MetaScanner.
> Stack trace:
> {code}
> 2013-05-07 04:49:40,828|machine|INFO|Exception in thread "main"
> java.lang.NullPointerException
> 2013-05-07 04:49:40,828|machine|INFO|at
> org.apache.hadoop.hbase.util.Writables.getWritable(Writables.java:103)
> 2013-05-07 04:49:40,828|machine|INFO|at
> org.apache.hadoop.hbase.util.Writables.getHRegionInfo(Writables.java:147)
> 2013-05-07 04:49:40,829|machine|INFO|at
> org.apache.hadoop.hbase.client.MetaScanner$BlockingMetaScannerVisitor.processRow(MetaScanner.java:406)
> 2013-05-07 04:49:40,829|machine|INFO|at
> org.apache.hadoop.hbase.client.MetaScanner$TableMetaScannerVisitor.processRow(MetaScanner.java:487)
> 2013-05-07 04:49:40,830|machine|INFO|at
> org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:224)
> 2013-05-07 04:49:40,830|machine|INFO|at
> org.apache.hadoop.hbase.client.MetaScanner.access$000(MetaScanner.java:54)
> 2013-05-07 04:49:40,830|machine|INFO|at
> org.apache.hadoop.hbase.client.MetaScanner$1.connect(MetaScanner.java:133)
> 2013-05-07 04:49:40,831|machine|INFO|at
> org.apache.hadoop.hbase.client.MetaScanner$1.connect(MetaScanner.java:130)
> 2013-05-07 04:49:40,831|machine|INFO|at
> org.apache.hadoop.hbase.client.HConnectionManager.execute(HConnectionManager.java:384)
> 2013-05-07 04:49:40,831|machine|INFO|at
> org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:130)
> 2013-05-07 04:49:40,832|machine|INFO|at
> org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:105)
> 2013-05-07 04:49:40,832|machine|INFO|at
> org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:83)
> 2013-05-07 04:49:40,832|machine|INFO|at
> org.apache.hadoop.hbase.client.MetaScanner.allTableRegions(MetaScanner.java:323)
> 2013-05-07 04:49:40,833|machine|INFO|at
> org.apache.hadoop.hbase.client.HTable.getRegionLocations(HTable.java:485)
> 2013-05-07 04:49:40,833|machine|INFO|at
> org.apache.hadoop.hbase.client.HTable.getStartEndKeys(HTable.java:438)
> {code}
> Master is doing the CatalogJanitor concurrently:
> {code}
> 2013-05-07 04:49:40,636 INFO org.apache.hadoop.hbase.catalog.MetaEditor:
> Deleted daughters references, qualifier=splitA and qualifier=splitB, from
> parent
> IntegrationTestBigLinkedList,\x07\xFB\x98\xB7a\x89\xF5\xE6,1367898577620.4ef1329ff0e8911db998ac8ccd32108d.
> 2013-05-07 04:49:40,666 INFO org.apache.hadoop.hbase.catalog.MetaEditor:
> Deleted region
> IntegrationTestBigLinkedList,\x07\xFB\x98\xB7a\x89\xF5\xE6,1367898577620.4ef1329ff0e8911db998ac8ccd32108d.
> from META
> 2013-05-07 04:49:40,690 INFO org.apache.hadoop.hbase.catalog.MetaEditor:
> Deleted daughters references, qualifier=splitA and qualifier=splitB, from
> parent
> IntegrationTestBigLinkedList,\x0B\xF8n\xEA\xD3\xAA\xA9\x92,1367898577620.b502376df2623cb0be3f0c1664d799a6.
> 2013-05-07 04:49:40,716 INFO org.apache.hadoop.hbase.catalog.MetaEditor:
> Deleted region
> IntegrationTestBigLinkedList,\x0B\xF8n\xEA\xD3\xAA\xA9\x92,1367898577620.b502376df2623cb0be3f0c1664d799a6.
> from META
> 2013-05-07 04:49:40,742 INFO org.apache.hadoop.hbase.catalog.MetaEditor:
> Deleted daughters references, qualifier=splitA and qualifier=splitB, from
> parent
> IntegrationTestBigLinkedList,\x17\xF5\x11\xB9\xE3\xDB)\x0C,1367898541729.ec2df58fafb823cec6e793ba35e2241d.
> {code}
> This is critical for 0.94, but not for 0.95 and trunk due to HBASE-7721.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira