[ 
https://issues.apache.org/jira/browse/HDFS-3863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13445301#comment-13445301
 ] 

Todd Lipcon commented on HDFS-3863:
-----------------------------------

Hi Chao. I tried to add the sanity checks you suggested, and ran into a little 
difficult with the first one. It caused a test failure in the following 
scenario:

JN1 has fallen behind, has: edits_inprogress with txid 44-45
JN2 and JN3 both finished writing this segment (44-47), had fully written 
48-51, and had started a log segment 42, without yet writing any transactions 
to it.

In the current code, when prepareRecovery() invokes scanStorage(), this caused 
JN2 and JN3 to return an empty {{lastSegmentTxId}}. So, the client code went 
into recovery of the log segment with txid 44. It correctly recovered to 44-47, 
but then the assertion failed because the other loggers had seen txid 51 
committed.

So, I had to fix {{scanStorage}} a bit so that it would return the correct most 
recent segment txid, even in this scenario.

I'll upload the improved patch soon after running some more test iterations. 
Thanks for the good idea, as it did catch a slight bug here!
                
> QJM: track last "committed" txid
> --------------------------------
>
>                 Key: HDFS-3863
>                 URL: https://issues.apache.org/jira/browse/HDFS-3863
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: ha
>    Affects Versions: QuorumJournalManager (HDFS-3077)
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>         Attachments: hdfs-3863-prelim.txt
>
>
> Per some discussion with [~stepinto] 
> [here|https://issues.apache.org/jira/browse/HDFS-3077?focusedCommentId=13422579&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13422579],
>  we should keep track of the "last committed txid" on each JournalNode. Then 
> during any recovery operation, we can sanity-check that we aren't asked to 
> truncate a log to an earlier transaction.
> This is also a necessary step if we want to support reading from in-progress 
> segments in the future (since we should only allow reads up to the commit 
> point)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to