[
https://issues.apache.org/jira/browse/HDFS-8964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Zhe Zhang updated HDFS-8964:
----------------------------
Attachment: HDFS-8964.02.patch
Updating the patch with a test (which turns out to be the hardest part).
Basically, we want to verify that NN doesn't try to read past the latest synced
op when validating the edit log file. But the {{validateLog}} logic absorbs all
exceptions.
So I had to verify the NameNode log for the error message that is expected
without the change.
> Provide max TxId when validating in-progress edit log files
> -----------------------------------------------------------
>
> Key: HDFS-8964
> URL: https://issues.apache.org/jira/browse/HDFS-8964
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: journal-node, namenode
> Affects Versions: 2.7.1
> Reporter: Zhe Zhang
> Assignee: Zhe Zhang
> Attachments: HDFS-8964.00.patch, HDFS-8964.01.patch,
> HDFS-8964.02.patch
>
>
> NN/JN validates in-progress edit log files in multiple scenarios, via
> {{EditLogFile#validateLog}}. The method scans through the edit log file to
> find the last transaction ID.
> However, an in-progress edit log file could be actively written to, which
> creates a race condition and causes incorrect data to be read (and later we
> attempt to interpret the data as ops).
> Currently {{validateLog}} is used in 3 places:
> # NN {{getEditsFromTxid}}
> # JN {{getEditLogManifest}}
> # NN/JN {{recoverUnfinalizedSegments}}
> In the first two scenarios we should provide a maximum TxId to validate in
> the in-progress file. The 3rd scenario won't cause a race condition because
> only non-current in-progress edit log files are validated.
> {{validateLog}} is actually only used with in-progress files, and could use a
> better name and Javadoc.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)