[ https://issues.apache.org/jira/browse/HBASE-5078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
stack updated HBASE-5078: ------------------------- Status: Patch Available (was: Open) > DistributedLogSplitter failing to split file because it has edits for lots of > regions > ------------------------------------------------------------------------------------- > > Key: HBASE-5078 > URL: https://issues.apache.org/jira/browse/HBASE-5078 > Project: HBase > Issue Type: Bug > Affects Versions: 0.92.0 > Reporter: stack > Assignee: stack > Priority: Critical > Fix For: 0.92.0 > > Attachments: 5078-v2.txt, 5078.txt > > > Testing 0.92.0RC, ran into interesting issue where a log file had edits for > many regions and just opening the file per region was taking so long, we were > never updating our progress and so the split of the log just kept failing; in > this case, the first 40 edits in a file required our opening 35 files -- > opening 35 files took longer than the hard-coded 25 seconds its supposed to > take "acquiring" the task. > First, here is master's view: > {code} > 2011-12-20 17:54:09,184 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: > task not yet acquired > /hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403487679 > ver = 0 > ... > 2011-12-20 17:54:09,233 INFO org.apache.hadoop.hbase.master.SplitLogManager: > task > /hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403487679 > acquired by sv4r27s44,7003,1324365396664 > ... > 2011-12-20 17:54:35,475 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: > task not yet acquired > /hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403573033 > ver = 3 > {code} > Master then gives it elsewhere. > Over on the regionserver we see: > {code} > 2011-12-20 17:54:09,233 INFO > org.apache.hadoop.hbase.regionserver.SplitLogWorker: worker > sv4r27s44,7003,1324365396664 acquired task > /hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403487679 > .... > 2011-12-20 17:54:10,714 DEBUG > org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter: > Path=hdfs://sv4r11s38:7000/hbase/splitlog/sv4r27s44,7003,1324365396664_hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403487679/TestTable/6b6bfc2716dff952435ab26f018648b2/recovered.ed > its/0000000000000278862.temp, syncFs=true, hflush=false > .... > {code} > .... and so on till: > {code} > 2011-12-20 17:54:36,876 INFO > org.apache.hadoop.hbase.regionserver.SplitLogWorker: task > /hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403487679 > preempted from sv4r27s44,7003,1324365396664, current task state and > owner=owned sv4r28s44,7003,1324365396678 > .... > 2011-12-20 17:54:37,112 WARN > org.apache.hadoop.hbase.regionserver.SplitLogWorker: Failed to heartbeat the > task/hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403487679 > .... > {code} > When above happened, we'd only processed 40 edits. As written, we only > heatbeat every 1024 edits. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira