[ 
https://issues.apache.org/jira/browse/HBASE-5078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-5078:
-------------------------

    Status: Patch Available  (was: Open)
    
> DistributedLogSplitter failing to split file because it has edits for lots of 
> regions
> -------------------------------------------------------------------------------------
>
>                 Key: HBASE-5078
>                 URL: https://issues.apache.org/jira/browse/HBASE-5078
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.92.0
>            Reporter: stack
>            Assignee: stack
>            Priority: Critical
>             Fix For: 0.92.0
>
>         Attachments: 5078-v2.txt, 5078.txt
>
>
> Testing 0.92.0RC, ran into interesting issue where a log file had edits for 
> many regions and just opening the file per region was taking so long, we were 
> never updating our progress and so the split of the log just kept failing; in 
> this case, the first 40 edits in a file required our opening 35 files -- 
> opening 35 files took longer than the hard-coded 25 seconds its supposed to 
> take "acquiring" the task.
> First, here is master's view:
> {code}
> 2011-12-20 17:54:09,184 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: 
> task not yet acquired 
> /hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403487679
>  ver = 0
> ...
> 2011-12-20 17:54:09,233 INFO org.apache.hadoop.hbase.master.SplitLogManager: 
> task 
> /hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403487679
>  acquired by sv4r27s44,7003,1324365396664
> ...
> 2011-12-20 17:54:35,475 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: 
> task not yet acquired 
> /hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403573033
>  ver = 3
> {code}
> Master then gives it elsewhere.
> Over on the regionserver we see:
> {code}
> 2011-12-20 17:54:09,233 INFO 
> org.apache.hadoop.hbase.regionserver.SplitLogWorker: worker 
> sv4r27s44,7003,1324365396664 acquired task 
> /hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403487679
> ....
> 2011-12-20 17:54:10,714 DEBUG 
> org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter: 
> Path=hdfs://sv4r11s38:7000/hbase/splitlog/sv4r27s44,7003,1324365396664_hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403487679/TestTable/6b6bfc2716dff952435ab26f018648b2/recovered.ed
> its/0000000000000278862.temp, syncFs=true, hflush=false
> ....
> {code}
> .... and so on till:
> {code}
> 2011-12-20 17:54:36,876 INFO 
> org.apache.hadoop.hbase.regionserver.SplitLogWorker: task 
> /hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403487679
>  preempted from sv4r27s44,7003,1324365396664, current task state and 
> owner=owned sv4r28s44,7003,1324365396678
> ....
> 2011-12-20 17:54:37,112 WARN 
> org.apache.hadoop.hbase.regionserver.SplitLogWorker: Failed to heartbeat the 
> task/hbase/splitlog/hdfs%3A%2F%2Fsv4r11s38%3A7000%2Fhbase%2F.logs%2Fsv4r31s44%2C7003%2C1324365396770-splitting%2Fsv4r31s44%252C7003%252C1324365396770.1324403487679
> ....
> {code}
> When above happened, we'd only processed 40 edits.  As written, we only 
> heatbeat every 1024 edits.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to