[ 
https://issues.apache.org/jira/browse/HBASE-826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12622465#action_12622465
 ] 

stack commented on HBASE-826:
-----------------------------

I'm seeing that the getClosestAtOrBefore is finding rows from previous 
incarnations of a table.  This triggers the stuck client over and over emitting 
'reloading .. because HRI was null or empty  in ".META."'.

On table delete, not all cells on a row in .META. are removed.  If a new table 
is created with the same name and schema as a just-deleted table, the first 
region with a null start and end row works fine for a while.  We split and then 
we may get lucky for a while but inevitably, the getClosestAtOrBefore will be 
queried with a row whose just-before will be from a previous incarnation of the 
table, only its a cell from the historian column family or its some old 
startcode that wasn't properly removed; so getClosestAtOrBefore returns a 
closet row where there is no HRI.   Client gets stuck because it keeps asking 
server for closest row for row 'X' and it keeps getting back the unsatisfactory 
answer.

We've had this issue since before 0.2.0 RC1.  Its what daniel and sebastien 
were seeing just before RC2 but we tried to accredit the issue first to a hdfs 
storm and to the running of an instance with hbase-751 not fixed.  We've not 
had it in test because our unit tests are regular and split in same place every 
time (so no chance for ghost regions to shine through).

I'm still trying to figure how we don't successfully deleteall.  Seems like 
table has to be big for the problem to start occurring.

After figuring the issue, we then need to make a tool that will fix up people's 
.META. tables removing any vestige of deleted tables.

'Workaround' is not to reuse a table name until this is fixed.

> delete table followed by recreation results in honked table
> -----------------------------------------------------------
>
>                 Key: HBASE-826
>                 URL: https://issues.apache.org/jira/browse/HBASE-826
>             Project: Hadoop HBase
>          Issue Type: Bug
>            Reporter: stack
>            Priority: Blocker
>             Fix For: 0.2.1, 0.3.0
>
>
> Daniel Leffel suspected that delete and then recreate causes issues.  I tried 
> it on our little cluster.  I'm doing a MR load up into the newly created 
> table and after a few million rows, the MR job just hangs.  Its looking for a 
> region that doesn't exist:
> {code}
> 2008-08-13 03:32:36,840 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: 
> Initializing JVM Metrics with processName=MAP, sessionId=
> 2008-08-13 03:32:36,940 INFO org.apache.hadoop.mapred.MapTask: 
> numReduceTasks: 1
> 2008-08-13 03:32:37,420 DEBUG 
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers: Found ROOT 
> REGION => {NAME => '-ROOT-,,0', STARTKEY => '', ENDKEY => '', ENCODED => 
> 70236052, TABLE => {{NAME => '-ROOT-', IS_ROOT => 'true', IS_META => 'true', 
> FAMILIES => [{NAME => 'info', BLOOMFILTER => 'false', COMPRESSION => 'NONE', 
> VERSIONS => '1', LENGTH => '2147483647', TTL => '-1', IN_MEMORY => 'false', 
> BLOCKCACHE => 'false'}]}}
> 2008-08-13 03:32:37,541 DEBUG 
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers: reloading 
> table servers because: HRegionInfo was null or empty in .META.
> 2008-08-13 03:32:37,541 DEBUG 
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers: Removed 
> .META.,,1 from cache because of TestTable,0008388608,99999999999999
> 2008-08-13 03:32:37,544 DEBUG 
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers: Found ROOT 
> REGION => {NAME => '-ROOT-,,0', STARTKEY => '', ENDKEY => '', ENCODED => 
> 70236052, TABLE => {{NAME => '-ROOT-', IS_ROOT => 'true', IS_META => 'true', 
> FAMILIES => [{NAME => 'info', BLOOMFILTER => 'false', COMPRESSION => 'NONE', 
> VERSIONS => '1', LENGTH => '2147483647', TTL => '-1', IN_MEMORY => 'false', 
> BLOCKCACHE => 'false'}]}}
> 2008-08-13 03:32:47,605 DEBUG 
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers: reloading 
> table servers because: HRegionInfo was null or empty in .META.
> 2008-08-13 03:32:47,606 DEBUG 
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers: Removed 
> .META.,,1 from cache because of TestTable,0008388608,99999999999999
> ....
> {code}
> My guess is that its a region that was in the tables' previous incarnation 
> with ghosts left over down inside .META.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to