[
https://issues.apache.org/jira/browse/HBASE-3412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12977570#action_12977570
]
Jean-Daniel Cryans commented on HBASE-3412:
-------------------------------------------
bq. The test doesn't assert anything. How do I know it successful? Should you
check the FS for anything?
Since I mock the FS to get the LEE, nothing changes on the filesystem... so you
think I should check if everything is still where it's supposed to be i.e. all
the logs in the logs folder?
bq. This seems dangerous. Is it?
Less dangerous than not handling it IMO, since currently it cancels the whole
log replay process. If your concern is that we might miss other kinds of
exceptions hidden in LEE, then I think we could grep the exception message for
"File does not exist" and otherwise let the exception come out like it
currently does... although it really bothers me to do that since it cancels log
splitting and guarantees data loss even if other logs after the one that throws
the exception were fine.
> HLogSplitter should handle missing HLogs
> ----------------------------------------
>
> Key: HBASE-3412
> URL: https://issues.apache.org/jira/browse/HBASE-3412
> Project: HBase
> Issue Type: Bug
> Reporter: Jean-Daniel Cryans
> Priority: Critical
> Fix For: 0.90.0
>
> Attachments: HBASE-3412.patch
>
>
> In build #48 (https://hudson.apache.org/hudson/job/hbase-0.90/48/),
> TestReplication failed because of missing rows on the slave cluster. The
> reason is that a region server that was killed was able to archive a log at
> the same time the master was trying to recover it:
> {noformat}
> [MASTER_META_SERVER_OPERATIONS-vesta.apache.org:47907-0] util.FSUtils(625):
> Recovering file
> hdfs://localhost:50121/user/hudson/.logs/vesta.apache.org,58598,1294117333857/vesta.apache.org%3A58598.1294117406909
> ...
> [RegionServer:0;vesta.apache.org,58598,1294117333857.logRoller] wal.HLog(740):
> moving old hlog file
> /user/hudson/.logs/vesta.apache.org,58598,1294117333857/vesta.apache.org%3A58598.1294117406909
> whose highest sequenceid is 422 to
> /user/hudson/.oldlogs/vesta.apache.org%3A58598.1294117406909
> ...
> [MASTER_META_SERVER_OPERATIONS-vesta.apache.org:47907-0]
> master.MasterFileSystem(204):
> Failed splitting
> hdfs://localhost:50121/user/hudson/.logs/vesta.apache.org,58598,1294117333857
> java.io.IOException: Failed to open
> hdfs://localhost:50121/user/hudson/.logs/vesta.apache.org,58598,1294117333857/vesta.apache.org%3A58598.1294117406909
> for append
> Caused by: org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException:
> org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException:
> No lease on
> /user/hudson/.logs/vesta.apache.org,58598,1294117333857/vesta.apache.org%3A58598.1294117406909
> File does not exist. [Lease. Holder: DFSClient_-986975908, pendingcreates:
> 1]
> {noformat}
> We should probably just handle the fact that a file could have been archived
> (maybe even check in .oldlogs to be sure) and move on to the next log.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.