[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15250484#comment-15250484
 ] 

Daniel Templeton commented on MAPREDUCE-6657:
---------------------------------------------

Thanks, [~haibochen].  Some comments:

{code}
 * HDFS is not running normally (either in start phrase or
{code}

should be

{code}
 * HDFS is not running normally (either in start phase or
{code}

{code}
  private static final String CLUSTER_BASE_DIR =
      MiniDFSCluster.getBaseDirectory();
...
    conf.set(MiniDFSCluster.HDFS_MINIDFS_BASEDIR, CLUSTER_BASE_DIR.substring(0,
        CLUSTER_BASE_DIR.length() - 1) + "_safemode");
{code}

Why not just set the base dir to what you want initially?

{code}
    final long maxJHSWaitTime = 500;
{code}

Tiny quibble: the name should probably be {{maxJhsWaitTime}}.  We have 
conflicting styles in the code, but IIRC the style guide says to only 
capitalize the first letter of acronyms in names.  (I could be wrong, so feel 
free to call my bluff.)

{code}
    dfsCluster.getFileSystem().setSafeMode(
        HdfsConstants.SafeModeAction.SAFEMODE_ENTER);
    Assert.assertTrue(dfsCluster.getFileSystem().isInSafeMode());
{code}

To be completely safe these lines should be inside the try.

{code}
      Assert.assertEquals("Job History Server is expected to be " +
          expectedExceptionMsg, expectedExceptionMsg, yex.getMessage());
{code}

should probably be more like

{code}
      Assert.assertEquals("Unexpected reconnect timeout exception message",
          expectedExceptionMsg, yex.getMessage());
{code}

The assert will include the expected value in the output.

> job history server can fail on startup when NameNode is in start phase
> ----------------------------------------------------------------------
>
>                 Key: MAPREDUCE-6657
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6657
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: jobhistoryserver
>            Reporter: Haibo Chen
>            Assignee: Haibo Chen
>         Attachments: mapreduce6657.001.patch, mapreduce6657.002.patch, 
> mapreduce6657.003.patch
>
>
> Job history server will try to create a history directory in HDFS on startup. 
> When NameNode is in safe mode, it will keep retrying for a configurable time 
> period.  However, it should also keeps retrying if the name node is in start 
> state. Safe mode does not happen until the NN is out of the startup phase. 
> A RetriableException with the text "NameNode still not started" is thrown 
> when the NN is in its internal service startup phase. We should add the check 
> for this specific exception in isBecauseSafeMode() to account for that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to