[
https://issues.apache.org/jira/browse/HDFS-979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13058059#comment-13058059
]
Matt Foley commented on HDFS-979:
---------------------------------
Sorry for the delay, I was at the Hadoop Summit. Thanks for waiting.
In general, more specific logging is good. I wanted to check that this
enhancement does not overlap with the reporting of bad dataDirs and editsDirs
from e.g. the callers of NNStorage#reportErrorsOnDirectories(). I now believe
there is no collision, so go for it! Here are some additional comments on this
patch:
FSImage:
recoverTransitionRead()
* Please use dataDirs.isEmpty() rather than dataDirs.size() == 0; same for
editsDirs. But it might be best to also check that they are non-null before
checking for emptiness.
* The enveloping check for "if (startOpt != StartupOption.IMPORT) {...}"
concerns me, since the old code did not have this check. It's confusing
because this method is called from both loadFSImage() and doImportCheckpoint(),
and in either case the dataDirs and editsDirs should be non-null and non-empty.
Here's the question: When IMPORT is set, and loadFSImage calls
recoverTransitionRead for the FIRST time (before the recursive call from
doImportCheckpoint), isn't it important that dataDirs and editsDirs be checked
for non-emptiness? I'm not sure, because FSNamesystem#getStorageDirs does have
a comment that says "//When importing image from a checkpoint, the name-node
can start with empty set of storage directories." But it also logs a warning
that this is a really bad idea, since no edits log could be recorded. Can you
explain why the old code (without this check) was considered okay?
* Looking further at FSNamesystem#getStorageDirs, when IMPORT is not set, it
seems you'll never get an empty directory list. It does
{code}
if (dirNames.isEmpty())
dirNames.add("file:///tmp/hadoop/dfs/name");
{code}
getEmptyRecoveryIOExceptionMsg()
* Comments have mis-spellings: editsDirs (not editDirs) and necessarily.
* Suggest that the dirType strings should be "fsimage" and "edits" rather than
"data" and "edit", in call from recoverTransitionRead(). Better yet, since
this really should be an enum, why not use the NameNodeDirType values IMAGE and
EDITS?
* There's no need to pass configKey and directoriesToCheck args. Since this is
a single-use method, keep the signature clean by fetching them within the
getEmptyRecoveryIOExceptionMsg() method, based on the dirType arg.
* In-line the value of "confValue".
* The test for dirsCollection.size() == 0 isn't necessary. If dirsCollection
isn't empty, this method won't be called, right? But if for some reason it
gets called when dirsCollection isn't empty, the log string would be suddenly
terminated at "Specifically: ". It's better to just proceed with constructing
the log string.
* In fact, why bother sending dirsCollection as an arg at all? It will always
be empty. The dirType has all the info needed.
* If dir.exists() print "EXISTS BUT IS APPARENTLY INACCESSIBLE", rather than
just "EXISTS".
* Instead of limiting the method to URIs with scheme == "file", can you use the
FileContext stuff to test .exists() on the directories regardless of scheme?
> FSImage should specify which dirs are missing when refusing to come up
> ----------------------------------------------------------------------
>
> Key: HDFS-979
> URL: https://issues.apache.org/jira/browse/HDFS-979
> Project: Hadoop HDFS
> Issue Type: Improvement
> Components: name-node
> Affects Versions: 0.22.0
> Reporter: Steve Loughran
> Assignee: Jim Plush
> Priority: Minor
> Fix For: 0.23.0
>
> Attachments: HDFS-979-take1.txt, HDFS-979-take2.txt,
> HDFS-979-take3.txt
>
>
> When {{FSImage}} can't come up as either it has no data or edit dirs, it
> tells me this
> {code}
> java.io.IOException: All specified directories are not accessible or do not
> exist.
> {code}
> What it doesn't do is say which of the two attributes are missing. This would
> be beneficial to anyone trying to track down the problem. Also, I don't think
> the message is correct. It's bailing out because dataDirs.size() == 0 ||
> editsDirs.size() == 0 , because a list is empty -not because the dirs aren't
> there, as there hasn't been any validation yet.
> More useful would be
> # Explicit mention of which attributes are null
> # Declare that this is because they are not in the config
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira