[
https://issues.apache.org/jira/browse/HDFS-2982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13278619#comment-13278619
]
Colin Patrick McCabe commented on HDFS-2982:
--------------------------------------------
bq. The javadoc for JournalSet#selectInputStreams is a little over-simplified
=) - how about describing the algorithm (get the streams starting with fromTxid
from all managers, return a list sorted by the starting txid etc)
Ok, will add.
bq. In EditLogFileInputStream#init why only close the stream that threw?
Yeah, I guess closing an already closed stream should be idempotent, at least
if they're correctly implementing the Closable interface.
bq. In TestEditLog readAllEdits is dead code
ok
bq. How about describing the high-level approach in the patch?
>From the high level, this patch is about getting rid of two APIs in
>JournalManager-- getNumberOfTransactions and getInputStream, and adding one
>API to JournalManager-- selectInputStreams. The new API simply gathers up all
>the available streams in one go and puts them into a Collection. This is more
>efficient, and also better for some of the changes we'd like to make in the
>future, like supporting overlapping edit log streams.
Edit log validation is the process of finding out how far in-progress edit logs
go. We do it during edit log finalization so that we can find out what to
rename the in-progress edit log file to. ("validation" might not be a great
name for this process, but it's probably too late to change it now.) We don't
validate finalized logs.
There are some minor changes to validation here, and a major change.
First, the minor changes. One change is to have the validation class contain
only the end txid, rather than the start txid, number of txids, and end txid.
The start txid is already known, and the number of txids does not represent
what you might think, but merely end - start + 1. So it's good to get rid of
that cruft. Another minor change is that EditLogValidation#corruptionDetected
was renamed to EditLogValidation#hasCorruptHeader. That is the concept it
always represented-- it never referred to anything other than header
corruption, and the rest of the code even uses the terminology hasCorruptHeader
to represent this info (see EditLogFile#hasCorruptHeader). So I'm just trying
to be consistent.
The major change is that we now read to the end of a corrupt file in
validation, finding the true end transaction rather than merely the first
unreadable txid. This is needed for recovery to work properly on these files.
It's possible that this change could be dropped from this patch. Conceptually,
it's more related to HDFS-3049.
> Startup performance suffers when there are many edit log segments
> -----------------------------------------------------------------
>
> Key: HDFS-2982
> URL: https://issues.apache.org/jira/browse/HDFS-2982
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: name-node
> Affects Versions: 2.0.0
> Reporter: Todd Lipcon
> Assignee: Colin Patrick McCabe
> Priority: Critical
> Attachments: HDFS-2982.001.patch
>
>
> For every one of the edit log segments, it seems like we are calling
> listFiles on the edit log directory inside of {{findMaxTransaction}}. This is
> killing performance, especially when there are many log segments and the
> directory is stored on NFS. It is taking several minutes to start up the NN
> when there are several thousand log segments present.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira