[ 
https://issues.apache.org/jira/browse/HDFS-979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13058059#comment-13058059
 ] 

Matt Foley commented on HDFS-979:
---------------------------------

Sorry for the delay, I was at the Hadoop Summit.  Thanks for waiting.
In general, more specific logging is good.  I wanted to check that this 
enhancement does not overlap with the reporting of bad dataDirs and editsDirs 
from e.g. the callers of  NNStorage#reportErrorsOnDirectories().  I now believe 
there is no collision, so go for it!  Here are some additional comments on this 
patch:

FSImage:

recoverTransitionRead()
* Please use dataDirs.isEmpty() rather than dataDirs.size() == 0; same for 
editsDirs.  But it might be best to also check that they are non-null before 
checking for emptiness.
* The enveloping check for "if (startOpt != StartupOption.IMPORT) {...}" 
concerns me, since the old code did not have this check.  It's confusing 
because this method is called from both loadFSImage() and doImportCheckpoint(), 
and in either case the dataDirs and editsDirs should be non-null and non-empty. 
 Here's the question:  When IMPORT is set, and loadFSImage calls 
recoverTransitionRead for the FIRST time (before the recursive call from 
doImportCheckpoint), isn't it important that dataDirs and editsDirs be checked 
for non-emptiness?  I'm not sure, because FSNamesystem#getStorageDirs does have 
a comment that says "//When importing image from a checkpoint, the name-node 
can start with empty set of storage directories."  But it also logs a warning 
that this is a really bad idea, since no edits log could be recorded.  Can you 
explain why the old code (without this check) was considered okay?
* Looking further at FSNamesystem#getStorageDirs, when IMPORT is not set, it 
seems you'll never get an empty directory list.  It does
{code}
if (dirNames.isEmpty())
      dirNames.add("file:///tmp/hadoop/dfs/name");
{code}

getEmptyRecoveryIOExceptionMsg()
* Comments have mis-spellings: editsDirs (not editDirs) and necessarily.
* Suggest that the dirType strings should be "fsimage" and "edits" rather than 
"data" and "edit", in call from recoverTransitionRead().  Better yet, since 
this really should be an enum, why not use the NameNodeDirType values IMAGE and 
EDITS?
* There's no need to pass configKey and directoriesToCheck args.  Since this is 
a single-use method, keep the signature clean by fetching them within the 
getEmptyRecoveryIOExceptionMsg() method, based on the dirType arg.
* In-line the value of "confValue".
* The test for dirsCollection.size() == 0 isn't necessary.  If dirsCollection 
isn't empty, this method won't be called, right?  But if for some reason it 
gets called when dirsCollection isn't empty, the log string would be suddenly 
terminated at "Specifically: ".  It's better to just proceed with constructing 
the log string.
* In fact, why bother sending dirsCollection as an arg at all?  It will always 
be empty.  The dirType has all the info needed.
* If dir.exists() print "EXISTS BUT IS APPARENTLY INACCESSIBLE", rather than 
just "EXISTS".
* Instead of limiting the method to URIs with scheme == "file", can you use the 
FileContext stuff to test .exists() on the directories regardless of scheme?


> FSImage should specify which dirs are missing when refusing to come up
> ----------------------------------------------------------------------
>
>                 Key: HDFS-979
>                 URL: https://issues.apache.org/jira/browse/HDFS-979
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: name-node
>    Affects Versions: 0.22.0
>            Reporter: Steve Loughran
>            Assignee: Jim Plush
>            Priority: Minor
>             Fix For: 0.23.0
>
>         Attachments: HDFS-979-take1.txt, HDFS-979-take2.txt, 
> HDFS-979-take3.txt
>
>
> When {{FSImage}} can't come up as either it has no data or edit dirs, it 
> tells me this
> {code}
> java.io.IOException: All specified directories are not accessible or do not 
> exist.
> {code}
> What it doesn't do is say which of the two attributes are missing. This would 
> be beneficial to anyone trying to track down the problem. Also, I don't think 
> the message is correct. It's bailing out because dataDirs.size() == 0 || 
> editsDirs.size() == 0 , because a list is empty -not because the dirs aren't 
> there, as there hasn't been any validation yet.
> More useful would be
> # Explicit mention of which attributes are null
> # Declare that this is because they are not in the config

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to