[ 
https://issues.apache.org/jira/browse/HDFS-2709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13177540#comment-13177540
 ] 

Todd Lipcon commented on HDFS-2709:
-----------------------------------

A few thoughts on the overall approach:
- Rather than modify EditLogFileInputStream to take a startTxId, why not do the 
"skipping" (what you call {{setInitialPosition}}) from the caller? ie modify 
{{FSEditLogLoader}} to skip the transactions that have already been replayed? 
The skipping code doesn't seem specific to the input stream itself.
- I'm not convinced why we need to have the {{partialLoadOk}} flag in 
{{FSEditLogLoader}}. IMO if the log is truncated, it's still an error as far as 
the loader is concerned - we just want to let the caller continue from where 
the error occured. The only trick is how to go about getting the last 
successfully loaded txid out of the FSEditLogLoader in the error case -- I 
guess a member variable and a getter would work there? Do you think this ends 
up messier than the way you've done it?
- Can we add some non-HA tests that exercise 
FileJournalManager/FSEditLogLoader's ability to start mid-stream? Not sure if 
that's feasible.
                
> HA: Appropriately handle error conditions in EditLogTailer
> ----------------------------------------------------------
>
>                 Key: HDFS-2709
>                 URL: https://issues.apache.org/jira/browse/HDFS-2709
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: ha, name-node
>    Affects Versions: HA branch (HDFS-1623)
>            Reporter: Todd Lipcon
>            Assignee: Aaron T. Myers
>            Priority: Critical
>         Attachments: HDFS-2709-HDFS-1623.patch, HDFS-2709-HDFS-1623.patch, 
> HDFS-2709-HDFS-1623.patch
>
>
> Currently if the edit log tailer experiences an error replaying edits in the 
> middle of a file, it will go back to retrying from the beginning of the file 
> on the next tailing iteration. This is incorrect since many of the edits will 
> have already been replayed, and not all edits are idempotent.
> Instead, we either need to (a) support reading from the middle of a finalized 
> file (ie skip those edits already applied), or (b) abort the standby if it 
> hits an error while tailing. If "a" isn't simple, let's do "b" for now and 
> come back to 'a' later since this is a rare circumstance and better to abort 
> than be incorrect.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to