[
https://issues.apache.org/jira/browse/HBASE-3367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jean-Daniel Cryans resolved HBASE-3367.
---------------------------------------
Resolution: Fixed
Hadoop Flags: [Reviewed]
Committed to branch and trunk.
> Failed log split not retried
> ----------------------------
>
> Key: HBASE-3367
> URL: https://issues.apache.org/jira/browse/HBASE-3367
> Project: HBase
> Issue Type: Bug
> Reporter: Jean-Daniel Cryans
> Priority: Blocker
> Fix For: 0.90.0
>
> Attachments: HBASE-3367.patch
>
>
> Found this running TestReplication:
> {noformat}
> 2010-12-15 17:58:33,639 DEBUG
> [MASTER_SERVER_OPERATIONS-h17.sfo.stumble.net:58644-0] wal.HLogSplitter(299)
> : Closed
> hdfs://localhost:58631/user/jdcryans/test/211477a0a924abda419b5579c7a83452/recovered.edits/0000000000000000002
> 2010-12-15 17:58:33,642 ERROR
> [MASTER_SERVER_OPERATIONS-h17.sfo.stumble.net:58644-0]
> master.MasterFileSystem(197):
> Failed splitting
> hdfs://localhost:58631/user/jdcryans/.logs/h17.sfo.stumble.net,58647,1292464631034
> java.io.IOException: Discovered orphan hlog after split. Maybe HRegionServer
> was not dead when we started
> at
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLog(HLogSplitter.java:290)
> at
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLog(HLogSplitter.java:151)
> at
> org.apache.hadoop.hbase.master.MasterFileSystem.splitLog(MasterFileSystem.java:193)
> at
> org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.process(ServerShutdownHandler.java:96)
> at
> org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:151)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:680)
> 2010-12-15 17:58:33,686 INFO
> [MASTER_SERVER_OPERATIONS-h17.sfo.stumble.net:58644-0]
> handler.ServerShutdownHandler(144):
> Reassigning 8 region(s) that h17.sfo.stumble.net,58647,1292464631034 was
> carrying (skipping 0 regions(s) that are already in transition)
> {noformat}
> What I see is that there was an orphan HLog, but the exception was eaten in
> MasterFileSystem.splitLog (it just logs as an error) and then it proceeds to
> reassign the regions. There is potential data loss.
> Another bad side effect is that those HLogs never get archived, and stay in
> .logs
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.