[
https://issues.apache.org/jira/browse/MAPREDUCE-6011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14076493#comment-14076493
]
Jason Lowe commented on MAPREDUCE-6011:
---------------------------------------
Sample error where a bad token state failed history server startup but didn't
explain which file contained the bad token state:
{noformat}
2014-07-11 22:51:14,977 [main] INFO impl.MetricsSystemImpl: JobHistoryServer
metrics system started
2014-07-11 22:51:16,079 [main] INFO
hs.HistoryServerFileSystemStateStoreService: Loading history server state from
hdfs:/xx
2014-07-11 22:51:46,747 [main] INFO service.AbstractService: Service
org.apache.hadoop.mapreduce.v2.hs.JobHistoryServer$HistoryServerSecretManagerService
failed in state STARTED; cause: java.io.EOFException
java.io.EOFException
at java.io.DataInputStream.readByte(DataInputStream.java:267)
at
org.apache.hadoop.security.token.delegation.AbstractDelegationTokenIdentifier.readFields(AbstractDelegationTokenIdentifier.java:179)
at
org.apache.hadoop.mapreduce.v2.hs.HistoryServerFileSystemStateStoreService.loadToken(HistoryServerFileSystemStateStoreService.java:295)
at
org.apache.hadoop.mapreduce.v2.hs.HistoryServerFileSystemStateStoreService.loadTokensFromBucket(HistoryServerFileSystemStateStoreService.java:314)
at
org.apache.hadoop.mapreduce.v2.hs.HistoryServerFileSystemStateStoreService.loadTokens(HistoryServerFileSystemStateStoreService.java:353)
at
org.apache.hadoop.mapreduce.v2.hs.HistoryServerFileSystemStateStoreService.loadTokenState(HistoryServerFileSystemStateStoreService.java:367)
at
org.apache.hadoop.mapreduce.v2.hs.HistoryServerFileSystemStateStoreService.loadState(HistoryServerFileSystemStateStoreService.java:114)
at
org.apache.hadoop.mapreduce.v2.hs.JobHistoryServer$HistoryServerSecretManagerService.serviceStart(JobHistoryServer.java:89)
at
org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
at
org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:120)
at
org.apache.hadoop.mapreduce.v2.hs.JobHistoryServer.serviceStart(JobHistoryServer.java:194)
at
org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
at
org.apache.hadoop.mapreduce.v2.hs.JobHistoryServer.launchJobHistoryServer(JobHistoryServer.java:220)
at
org.apache.hadoop.mapreduce.v2.hs.JobHistoryServer.main(JobHistoryServer.java:229)
2014-07-11 22:51:46,749 [main] INFO service.AbstractService: Service
org.apache.hadoop.mapreduce.v2.hs.JobHistoryServer failed in state STARTED;
cause: org.apache.hadoop.service.ServiceStateException: java.io.EOFException
org.apache.hadoop.service.ServiceStateException: java.io.EOFException
at
org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:59)
at
org.apache.hadoop.service.AbstractService.start(AbstractService.java:204)
at
org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:120)
at
org.apache.hadoop.mapreduce.v2.hs.JobHistoryServer.serviceStart(JobHistoryServer.java:194)
at
org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
at
org.apache.hadoop.mapreduce.v2.hs.JobHistoryServer.launchJobHistoryServer(JobHistoryServer.java:220)
at
org.apache.hadoop.mapreduce.v2.hs.JobHistoryServer.main(JobHistoryServer.java:229)
Caused by: java.io.EOFException
at java.io.DataInputStream.readByte(DataInputStream.java:267)
at
org.apache.hadoop.security.token.delegation.AbstractDelegationTokenIdentifier.readFields(AbstractDelegationTokenIdentifier.java:179)
at
org.apache.hadoop.mapreduce.v2.hs.HistoryServerFileSystemStateStoreService.loadToken(HistoryServerFileSystemStateStoreService.java:295)
at
org.apache.hadoop.mapreduce.v2.hs.HistoryServerFileSystemStateStoreService.loadTokensFromBucket(HistoryServerFileSystemStateStoreService.java:314)
at
org.apache.hadoop.mapreduce.v2.hs.HistoryServerFileSystemStateStoreService.loadTokens(HistoryServerFileSystemStateStoreService.java:353)
at
org.apache.hadoop.mapreduce.v2.hs.HistoryServerFileSystemStateStoreService.loadTokenState(HistoryServerFileSystemStateStoreService.java:367)
at
org.apache.hadoop.mapreduce.v2.hs.HistoryServerFileSystemStateStoreService.loadState(HistoryServerFileSystemStateStoreService.java:114)
at
org.apache.hadoop.mapreduce.v2.hs.JobHistoryServer$HistoryServerSecretManagerService.serviceStart(JobHistoryServer.java:89)
at
org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
... 5 more
2014-07-11 22:51:46,750 [main] INFO impl.MetricsSystemImpl: Stopping
JobHistoryServer metrics system...
{noformat}
Note the lack of details on which token was being loaded. Also the log should
be at at least at the WARN level if we let the JHS continue past this error or
at least the ERROR log level if it remains fatal to starting up.
> Improve history server behavior during a recovery error
> -------------------------------------------------------
>
> Key: MAPREDUCE-6011
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6011
> Project: Hadoop Map/Reduce
> Issue Type: Improvement
> Components: jobhistoryserver
> Affects Versions: 2.3.0
> Reporter: Jason Lowe
>
> Currently when the history server encounters an error during recovery it is
> fatal without specific details on the error (e.g. which token was involved
> during the recovery error). We should either allow the history server to
> proceed past recovery errors or provide more specifics on the offending token
> involved in the fatal error to aid in manual recovery.
--
This message was sent by Atlassian JIRA
(v6.2#6252)