[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14076493#comment-14076493
 ] 

Jason Lowe commented on MAPREDUCE-6011:
---------------------------------------

Sample error where a bad token state failed history server startup but didn't 
explain which file contained the bad token state:

{noformat}
2014-07-11 22:51:14,977 [main] INFO impl.MetricsSystemImpl: JobHistoryServer 
metrics system started
2014-07-11 22:51:16,079 [main] INFO 
hs.HistoryServerFileSystemStateStoreService: Loading history server state from 
hdfs:/xx
2014-07-11 22:51:46,747 [main] INFO service.AbstractService: Service 
org.apache.hadoop.mapreduce.v2.hs.JobHistoryServer$HistoryServerSecretManagerService
 failed in state STARTED; cause: java.io.EOFException
java.io.EOFException
        at java.io.DataInputStream.readByte(DataInputStream.java:267)
        at 
org.apache.hadoop.security.token.delegation.AbstractDelegationTokenIdentifier.readFields(AbstractDelegationTokenIdentifier.java:179)
        at 
org.apache.hadoop.mapreduce.v2.hs.HistoryServerFileSystemStateStoreService.loadToken(HistoryServerFileSystemStateStoreService.java:295)
        at 
org.apache.hadoop.mapreduce.v2.hs.HistoryServerFileSystemStateStoreService.loadTokensFromBucket(HistoryServerFileSystemStateStoreService.java:314)
        at 
org.apache.hadoop.mapreduce.v2.hs.HistoryServerFileSystemStateStoreService.loadTokens(HistoryServerFileSystemStateStoreService.java:353)
        at 
org.apache.hadoop.mapreduce.v2.hs.HistoryServerFileSystemStateStoreService.loadTokenState(HistoryServerFileSystemStateStoreService.java:367)
        at 
org.apache.hadoop.mapreduce.v2.hs.HistoryServerFileSystemStateStoreService.loadState(HistoryServerFileSystemStateStoreService.java:114)
        at 
org.apache.hadoop.mapreduce.v2.hs.JobHistoryServer$HistoryServerSecretManagerService.serviceStart(JobHistoryServer.java:89)
        at 
org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
        at 
org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:120)
        at 
org.apache.hadoop.mapreduce.v2.hs.JobHistoryServer.serviceStart(JobHistoryServer.java:194)
        at 
org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
        at 
org.apache.hadoop.mapreduce.v2.hs.JobHistoryServer.launchJobHistoryServer(JobHistoryServer.java:220)
        at 
org.apache.hadoop.mapreduce.v2.hs.JobHistoryServer.main(JobHistoryServer.java:229)
2014-07-11 22:51:46,749 [main] INFO service.AbstractService: Service 
org.apache.hadoop.mapreduce.v2.hs.JobHistoryServer failed in state STARTED; 
cause: org.apache.hadoop.service.ServiceStateException: java.io.EOFException
org.apache.hadoop.service.ServiceStateException: java.io.EOFException
        at 
org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:59)
        at 
org.apache.hadoop.service.AbstractService.start(AbstractService.java:204)
        at 
org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:120)
        at 
org.apache.hadoop.mapreduce.v2.hs.JobHistoryServer.serviceStart(JobHistoryServer.java:194)
        at 
org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
        at 
org.apache.hadoop.mapreduce.v2.hs.JobHistoryServer.launchJobHistoryServer(JobHistoryServer.java:220)
        at 
org.apache.hadoop.mapreduce.v2.hs.JobHistoryServer.main(JobHistoryServer.java:229)
Caused by: java.io.EOFException
        at java.io.DataInputStream.readByte(DataInputStream.java:267)
        at 
org.apache.hadoop.security.token.delegation.AbstractDelegationTokenIdentifier.readFields(AbstractDelegationTokenIdentifier.java:179)
        at 
org.apache.hadoop.mapreduce.v2.hs.HistoryServerFileSystemStateStoreService.loadToken(HistoryServerFileSystemStateStoreService.java:295)
        at 
org.apache.hadoop.mapreduce.v2.hs.HistoryServerFileSystemStateStoreService.loadTokensFromBucket(HistoryServerFileSystemStateStoreService.java:314)
        at 
org.apache.hadoop.mapreduce.v2.hs.HistoryServerFileSystemStateStoreService.loadTokens(HistoryServerFileSystemStateStoreService.java:353)
        at 
org.apache.hadoop.mapreduce.v2.hs.HistoryServerFileSystemStateStoreService.loadTokenState(HistoryServerFileSystemStateStoreService.java:367)
        at 
org.apache.hadoop.mapreduce.v2.hs.HistoryServerFileSystemStateStoreService.loadState(HistoryServerFileSystemStateStoreService.java:114)
        at 
org.apache.hadoop.mapreduce.v2.hs.JobHistoryServer$HistoryServerSecretManagerService.serviceStart(JobHistoryServer.java:89)
        at 
org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
        ... 5 more
2014-07-11 22:51:46,750 [main] INFO impl.MetricsSystemImpl: Stopping 
JobHistoryServer metrics system...
{noformat}

Note the lack of details on which token was being loaded.  Also the log should 
be at at least at the WARN level if we let the JHS continue past this error or 
at least the ERROR log level if it remains fatal to starting up.

> Improve history server behavior during a recovery error
> -------------------------------------------------------
>
>                 Key: MAPREDUCE-6011
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6011
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: jobhistoryserver
>    Affects Versions: 2.3.0
>            Reporter: Jason Lowe
>
> Currently when the history server encounters an error during recovery it is 
> fatal without specific details on the error (e.g. which token was involved 
> during the recovery error).  We should either allow the history server to 
> proceed past recovery errors or provide more specifics on the offending token 
> involved in the fatal error to aid in manual recovery.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to