[ 
https://issues.apache.org/jira/browse/HBASE-826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12626844#action_12626844
 ] 

stack commented on HBASE-826:
-----------------------------

Ok, so figured the bug I was seeing AFTER application of above J-D patch. I 
thought the bug was more of what J-D's patch was supposed to be fixing but its 
something else, something equally as ugly.

Here is how I consistently generated the problem with J-D's patch in place:
{code}
Fill a table.
Stop hbase so files are flushed.
Start hbase.
Remove table (disable/drop).
Stop hbase so again flushed to filesystem.
Then look at the content of the .META. using above iteratemeta script.
Sort -u output then run below 'check' script to match delete and non-delete 
cells
{code}

Here's the script:
{code}                                                                          
                                                                                
                                                                   1,1          
  All
#!/usr/bin/env ruby
# Take on STDIN, sorted and uniqued output of the iteratemeta.rb
lastline = nil
for line in STDIN
  if line =~ /(.*)\s+d$/
    if lastline != nil
      puts lastline unless lastline.eql?($1)
    end
    lastline = nil
  else
    puts lastline unless lastline == nil
    lastline = line.rstrip
  end
end
{code}

Was finding a few keys that should have had overshadowing deletes but the 
deletes were not present.  (If I attempted refilling table, eventually, we'd 
fail with the 'empty HRI' complaint).

I thought the fail was because the above J-D patch was incomplete.

Turns out its a problem in compactions.  We see it since the compaction 
algorithm changed.

Here is what is happening.

Max versions by default in '.META.' table is *1*.  Version check looks at row 
and column component of a HStoreKey only: i.e. not at timestamp.  
info:serverstartcode and info:server are edited everytime we startup and when 
we offline (disable) and delete.  The delete cell has same timestamp as the 
cell it would delete; i.e. it'll usually be older than offlining update.

So, with our new smart compaction, we do not do every file when we compact.  
What I was seeing after the restarts was that the two most recent files would 
have been compacted on final restart.  We'd discard the delete cell that was 
not the offlining of info:server and info:serverstartcode.  The original cell 
would be back in the biggest and oldest storefile.  It would on occasion get a 
chance to come through doing getClosestAtOrBefore.

So, I think the fix is that we cannot remove anything compacting, not until we 
do a major compaction when all files are in play.  Will talk to Billy over in 
hbase-834 about it.  Meantime, will apply the J-D patch and close this issue.

> delete table followed by recreation results in honked table
> -----------------------------------------------------------
>
>                 Key: HBASE-826
>                 URL: https://issues.apache.org/jira/browse/HBASE-826
>             Project: Hadoop HBase
>          Issue Type: Bug
>            Reporter: stack
>            Assignee: stack
>            Priority: Blocker
>             Fix For: 0.2.1, 0.18.0
>
>         Attachments: 826-v3.patch, hbase-826_0.3.0.patch
>
>
> Daniel Leffel suspected that delete and then recreate causes issues.  I tried 
> it on our little cluster.  I'm doing a MR load up into the newly created 
> table and after a few million rows, the MR job just hangs.  Its looking for a 
> region that doesn't exist:
> {code}
> 2008-08-13 03:32:36,840 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: 
> Initializing JVM Metrics with processName=MAP, sessionId=
> 2008-08-13 03:32:36,940 INFO org.apache.hadoop.mapred.MapTask: 
> numReduceTasks: 1
> 2008-08-13 03:32:37,420 DEBUG 
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers: Found ROOT 
> REGION => {NAME => '-ROOT-,,0', STARTKEY => '', ENDKEY => '', ENCODED => 
> 70236052, TABLE => {{NAME => '-ROOT-', IS_ROOT => 'true', IS_META => 'true', 
> FAMILIES => [{NAME => 'info', BLOOMFILTER => 'false', COMPRESSION => 'NONE', 
> VERSIONS => '1', LENGTH => '2147483647', TTL => '-1', IN_MEMORY => 'false', 
> BLOCKCACHE => 'false'}]}}
> 2008-08-13 03:32:37,541 DEBUG 
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers: reloading 
> table servers because: HRegionInfo was null or empty in .META.
> 2008-08-13 03:32:37,541 DEBUG 
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers: Removed 
> .META.,,1 from cache because of TestTable,0008388608,99999999999999
> 2008-08-13 03:32:37,544 DEBUG 
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers: Found ROOT 
> REGION => {NAME => '-ROOT-,,0', STARTKEY => '', ENDKEY => '', ENCODED => 
> 70236052, TABLE => {{NAME => '-ROOT-', IS_ROOT => 'true', IS_META => 'true', 
> FAMILIES => [{NAME => 'info', BLOOMFILTER => 'false', COMPRESSION => 'NONE', 
> VERSIONS => '1', LENGTH => '2147483647', TTL => '-1', IN_MEMORY => 'false', 
> BLOCKCACHE => 'false'}]}}
> 2008-08-13 03:32:47,605 DEBUG 
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers: reloading 
> table servers because: HRegionInfo was null or empty in .META.
> 2008-08-13 03:32:47,606 DEBUG 
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers: Removed 
> .META.,,1 from cache because of TestTable,0008388608,99999999999999
> ....
> {code}
> My guess is that its a region that was in the tables' previous incarnation 
> with ghosts left over down inside .META.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to