[
https://issues.apache.org/jira/browse/HBASE-826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12622465#action_12622465
]
stack commented on HBASE-826:
-----------------------------
I'm seeing that the getClosestAtOrBefore is finding rows from previous
incarnations of a table. This triggers the stuck client over and over emitting
'reloading .. because HRI was null or empty in ".META."'.
On table delete, not all cells on a row in .META. are removed. If a new table
is created with the same name and schema as a just-deleted table, the first
region with a null start and end row works fine for a while. We split and then
we may get lucky for a while but inevitably, the getClosestAtOrBefore will be
queried with a row whose just-before will be from a previous incarnation of the
table, only its a cell from the historian column family or its some old
startcode that wasn't properly removed; so getClosestAtOrBefore returns a
closet row where there is no HRI. Client gets stuck because it keeps asking
server for closest row for row 'X' and it keeps getting back the unsatisfactory
answer.
We've had this issue since before 0.2.0 RC1. Its what daniel and sebastien
were seeing just before RC2 but we tried to accredit the issue first to a hdfs
storm and to the running of an instance with hbase-751 not fixed. We've not
had it in test because our unit tests are regular and split in same place every
time (so no chance for ghost regions to shine through).
I'm still trying to figure how we don't successfully deleteall. Seems like
table has to be big for the problem to start occurring.
After figuring the issue, we then need to make a tool that will fix up people's
.META. tables removing any vestige of deleted tables.
'Workaround' is not to reuse a table name until this is fixed.
> delete table followed by recreation results in honked table
> -----------------------------------------------------------
>
> Key: HBASE-826
> URL: https://issues.apache.org/jira/browse/HBASE-826
> Project: Hadoop HBase
> Issue Type: Bug
> Reporter: stack
> Priority: Blocker
> Fix For: 0.2.1, 0.3.0
>
>
> Daniel Leffel suspected that delete and then recreate causes issues. I tried
> it on our little cluster. I'm doing a MR load up into the newly created
> table and after a few million rows, the MR job just hangs. Its looking for a
> region that doesn't exist:
> {code}
> 2008-08-13 03:32:36,840 INFO org.apache.hadoop.metrics.jvm.JvmMetrics:
> Initializing JVM Metrics with processName=MAP, sessionId=
> 2008-08-13 03:32:36,940 INFO org.apache.hadoop.mapred.MapTask:
> numReduceTasks: 1
> 2008-08-13 03:32:37,420 DEBUG
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers: Found ROOT
> REGION => {NAME => '-ROOT-,,0', STARTKEY => '', ENDKEY => '', ENCODED =>
> 70236052, TABLE => {{NAME => '-ROOT-', IS_ROOT => 'true', IS_META => 'true',
> FAMILIES => [{NAME => 'info', BLOOMFILTER => 'false', COMPRESSION => 'NONE',
> VERSIONS => '1', LENGTH => '2147483647', TTL => '-1', IN_MEMORY => 'false',
> BLOCKCACHE => 'false'}]}}
> 2008-08-13 03:32:37,541 DEBUG
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers: reloading
> table servers because: HRegionInfo was null or empty in .META.
> 2008-08-13 03:32:37,541 DEBUG
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers: Removed
> .META.,,1 from cache because of TestTable,0008388608,99999999999999
> 2008-08-13 03:32:37,544 DEBUG
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers: Found ROOT
> REGION => {NAME => '-ROOT-,,0', STARTKEY => '', ENDKEY => '', ENCODED =>
> 70236052, TABLE => {{NAME => '-ROOT-', IS_ROOT => 'true', IS_META => 'true',
> FAMILIES => [{NAME => 'info', BLOOMFILTER => 'false', COMPRESSION => 'NONE',
> VERSIONS => '1', LENGTH => '2147483647', TTL => '-1', IN_MEMORY => 'false',
> BLOCKCACHE => 'false'}]}}
> 2008-08-13 03:32:47,605 DEBUG
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers: reloading
> table servers because: HRegionInfo was null or empty in .META.
> 2008-08-13 03:32:47,606 DEBUG
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers: Removed
> .META.,,1 from cache because of TestTable,0008388608,99999999999999
> ....
> {code}
> My guess is that its a region that was in the tables' previous incarnation
> with ghosts left over down inside .META.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.