DistributedLogSplitter failing to split file because it has edits for lots of 
regions
-------------------------------------------------------------------------------------

                 Key: HBASE-5078
                 URL: https://issues.apache.org/jira/browse/HBASE-5078
             Project: HBase
          Issue Type: Bug
    Affects Versions: 0.92.0
            Reporter: stack


Testing 0.92.0RC, ran into interesting issue where a log file had edits for 
many regions and just opening the file per region was taking so long, we were 
never updating our progress and so the split of the log just kept failing; in 
this case, the first 40 edits in a file required our opening 35 files -- 
opening 35 files took longer than the hard-coded 25 seconds its supposed to 
take "acquiring" the task.

First, here is master's view:

{code}
2011-12-20 17:54:09,184 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: 
task not yet acquired 
/hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403487679
 ver = 0
...
2011-12-20 17:54:09,233 INFO org.apache.hadoop.hbase.master.SplitLogManager: 
task 
/hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403487679
 acquired by sv4r27s44,7003,1324365396664
...
2011-12-20 17:54:35,475 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: 
task not yet acquired 
/hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403573033
 ver = 3
{code}

Master then gives it elsewhere.

Over on the regionserver we see:

{code}
2011-12-20 17:54:09,233 INFO 
org.apache.hadoop.hbase.regionserver.SplitLogWorker: worker 
sv4r27s44,7003,1324365396664 acquired task 
/hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403487679
....
2011-12-20 17:54:10,714 DEBUG 
org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter: 
Path=hdfs://sv4r11s38:7000/hbase/splitlog/sv4r27s44,7003,1324365396664_hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403487679/TestTable/6b6bfc2716dff952435ab26f018648b2/recovered.ed
its/0000000000000278862.temp, syncFs=true, hflush=false
....

{code}

.... and so on till:

{code}
2011-12-20 17:54:36,876 INFO 
org.apache.hadoop.hbase.regionserver.SplitLogWorker: task 
/hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403487679
 preempted from sv4r27s44,7003,1324365396664, current task state and 
owner=owned sv4r28s44,7003,1324365396678

....

2011-12-20 17:54:37,112 WARN 
org.apache.hadoop.hbase.regionserver.SplitLogWorker: Failed to heartbeat the 
task/hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403487679
....


{code}

When above happened, we'd only processed 40 edits.  As written, we only 
heatbeat every 1024 edits.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to