[
https://issues.apache.org/jira/browse/HBASE-5078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13173643#comment-13173643
]
Zhihong Yu commented on HBASE-5078:
-----------------------------------
How about naming everyNopenedFiles as numOpenedFilesBeforeReporting ?
> DistributedLogSplitter failing to split file because it has edits for lots of
> regions
> -------------------------------------------------------------------------------------
>
> Key: HBASE-5078
> URL: https://issues.apache.org/jira/browse/HBASE-5078
> Project: HBase
> Issue Type: Bug
> Affects Versions: 0.92.0
> Reporter: stack
> Assignee: stack
> Priority: Critical
> Fix For: 0.92.0
>
> Attachments: 5078-v2.txt, 5078.txt
>
>
> Testing 0.92.0RC, ran into interesting issue where a log file had edits for
> many regions and just opening the file per region was taking so long, we were
> never updating our progress and so the split of the log just kept failing; in
> this case, the first 40 edits in a file required our opening 35 files --
> opening 35 files took longer than the hard-coded 25 seconds its supposed to
> take "acquiring" the task.
> First, here is master's view:
> {code}
> 2011-12-20 17:54:09,184 DEBUG org.apache.hadoop.hbase.master.SplitLogManager:
> task not yet acquired
> /hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403487679
> ver = 0
> ...
> 2011-12-20 17:54:09,233 INFO org.apache.hadoop.hbase.master.SplitLogManager:
> task
> /hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403487679
> acquired by sv4r27s44,7003,1324365396664
> ...
> 2011-12-20 17:54:35,475 DEBUG org.apache.hadoop.hbase.master.SplitLogManager:
> task not yet acquired
> /hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403573033
> ver = 3
> {code}
> Master then gives it elsewhere.
> Over on the regionserver we see:
> {code}
> 2011-12-20 17:54:09,233 INFO
> org.apache.hadoop.hbase.regionserver.SplitLogWorker: worker
> sv4r27s44,7003,1324365396664 acquired task
> /hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403487679
> ....
> 2011-12-20 17:54:10,714 DEBUG
> org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter:
> Path=hdfs://sv4r11s38:7000/hbase/splitlog/sv4r27s44,7003,1324365396664_hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403487679/TestTable/6b6bfc2716dff952435ab26f018648b2/recovered.ed
> its/0000000000000278862.temp, syncFs=true, hflush=false
> ....
> {code}
> .... and so on till:
> {code}
> 2011-12-20 17:54:36,876 INFO
> org.apache.hadoop.hbase.regionserver.SplitLogWorker: task
> /hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403487679
> preempted from sv4r27s44,7003,1324365396664, current task state and
> owner=owned sv4r28s44,7003,1324365396678
> ....
> 2011-12-20 17:54:37,112 WARN
> org.apache.hadoop.hbase.regionserver.SplitLogWorker: Failed to heartbeat the
> task/hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403487679
> ....
> {code}
> When above happened, we'd only processed 40 edits. As written, we only
> heatbeat every 1024 edits.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira