[
https://issues.apache.org/jira/browse/TWILL-106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14201189#comment-14201189
]
Alvin Wang commented on TWILL-106:
----------------------------------
Created a new CDAP cluster with default delegation token configuration, 24 hr
kerberos token lifetime, and 0 kerberos token renew lifetime here:
112.72.148.146.bc.googleusercontent.com
Transaction service fails after 24 hours of running, due to "token is expired"
(sometimes creating a new cluster like this gives the "token is not found in
cache" error message.:
{code}
Launching main: public static void
org.apache.twill.internal.container.TwillContainerMain.main(java.lang.String[])
throws java.lang.Exception []
LOGBACK: No context given for
co.cask.cdap.logging.appender.kafka.KafkaLogAppender[KafkaLogAppender]
2014-11-05 20:29:30,513 - INFO
[ServiceDelegate:c.c.c.l.a.LogAppenderInitializer@65] - Initializing log
appender KafkaLogAppender
2014-11-05 20:29:30,545 - INFO
[ServiceDelegate:c.c.c.d.r.m.TransactionServiceTwillRunnable@86] - Initializing
runnable transaction
2014-11-05 20:29:30,649 - INFO
[ServiceDelegate:c.c.c.d.r.m.TransactionServiceTwillRunnable@88] - transaction
Setting host name to 112.72.148.146.bc.googleusercontent.com
2014-11-05 20:29:31,058 - INFO
[ServiceDelegate:c.c.c.d.r.m.TransactionServiceTwillRunnable@101] - Runnable
initialized transaction
2014-11-05 20:29:31,082 - INFO
[ServiceDelegate:c.c.c.c.t.AbstractMasterTwillRunnable@109] - Starting runnable
transaction
2014-11-05 20:29:31,106 - INFO
[ServiceDelegate:c.c.c.c.t.AbstractMasterTwillRunnable@118] - Runnable started
transaction
2014-11-06 20:29:54,985 - ERROR [tx-snapshot:c.c.t.TransactionManager@594] -
Aborting transaction manager due to: Snapshot (timestamp 1415305794971) failed
due to: token (HDFS_DELEGATION_TOKEN token 2 for yarn) is expired
org.apache.hadoop.ipc.RemoteException: token (HDFS_DELEGATION_TOKEN token 2 for
yarn) is expired
at org.apache.hadoop.ipc.Client.call(Client.java:1347)
~[hadoop-common-2.2.0.2.0.11.0-1.jar:na]
at org.apache.hadoop.ipc.Client.call(Client.java:1300)
~[hadoop-common-2.2.0.2.0.11.0-1.jar:na]
at
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
~[hadoop-common-2.2.0.2.0.11.0-1.jar:na]
at com.sun.proxy.$Proxy9.create(Unknown Source) ~[na:na]
at sun.reflect.GeneratedMethodAccessor32.invoke(Unknown Source) ~[na:na]
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
~[na:1.6.0_45]
at java.lang.reflect.Method.invoke(Method.java:597) ~[na:1.6.0_45]
at
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186)
~[hadoop-common-2.2.0.2.0.11.0-1.jar:na]
at
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
~[hadoop-common-2.2.0.2.0.11.0-1.jar:na]
at com.sun.proxy.$Proxy9.create(Unknown Source) ~[na:na]
at
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.create(ClientNamenodeProtocolTranslatorPB.java:227)
~[hadoop-hdfs-2.2.0.2.0.11.0-1.jar:na]
at
org.apache.hadoop.hdfs.DFSOutputStream.newStreamForCreate(DFSOutputStream.java:1392)
~[hadoop-hdfs-2.2.0.2.0.11.0-1.jar:na]
at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1382)
~[hadoop-hdfs-2.2.0.2.0.11.0-1.jar:na]
at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1307)
~[hadoop-hdfs-2.2.0.2.0.11.0-1.jar:na]
at
org.apache.hadoop.hdfs.DistributedFileSystem$6.doCall(DistributedFileSystem.java:384)
~[hadoop-hdfs-2.2.0.2.0.11.0-1.jar:na]
at
org.apache.hadoop.hdfs.DistributedFileSystem$6.doCall(DistributedFileSystem.java:380)
~[hadoop-hdfs-2.2.0.2.0.11.0-1.jar:na]
at
org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
~[hadoop-common-2.2.0.2.0.11.0-1.jar:na]
at
org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:380)
~[hadoop-hdfs-2.2.0.2.0.11.0-1.jar:na]
at
org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:324)
~[hadoop-hdfs-2.2.0.2.0.11.0-1.jar:na]
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:905)
~[hadoop-common-2.2.0.2.0.11.0-1.jar:na]
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:886)
~[hadoop-common-2.2.0.2.0.11.0-1.jar:na]
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:848)
~[hadoop-common-2.2.0.2.0.11.0-1.jar:na]
at
co.cask.tephra.persist.HDFSTransactionStateStorage.writeSnapshot(HDFSTransactionStateStorage.java:120)
~[co.cask.tephra.tephra-core-0.3.0.jar:na]
at
co.cask.tephra.TransactionManager.doSnapshot(TransactionManager.java:420)
[co.cask.tephra.tephra-core-0.3.0.jar:na]
at
co.cask.tephra.TransactionManager.access$400(TransactionManager.java:101)
[co.cask.tephra.tephra-core-0.3.0.jar:na]
at
co.cask.tephra.TransactionManager$2.doRun(TransactionManager.java:257)
[co.cask.tephra.tephra-core-0.3.0.jar:na]
at
co.cask.tephra.TransactionManager$DaemonThreadExecutor.run(TransactionManager.java:1014)
[co.cask.tephra.tephra-core-0.3.0.jar:na]
2014-11-06 20:29:54,994 - ERROR [tx-snapshot:c.c.t.d.TransactionService$1$1@75]
- Transaction manager aborted, stopping transaction service
{code}
namenode logs indicate that token 2 (the one in the error message) is actually
an older token. Transaction service properly obtained a new delegation token
just before the failure:
{code}
2014-11-05 20:23:32,798 INFO [main]
delegation.AbstractDelegationTokenSecretManager
(AbstractDelegationTokenSecretManager.java:updateCurrentKey(223)) - Updating
the current master key for generating delegation tokens
2014-11-05 20:23:32,801 INFO [Thread[Thread-13,5,main]]
delegation.AbstractDelegationTokenSecretManager
(AbstractDelegationTokenSecretManager.java:run(536)) - Starting expired
delegation token remover thread, tokenRemoverScanInterval=60 min(s)
2014-11-05 20:23:32,801 INFO [Thread[Thread-13,5,main]]
delegation.AbstractDelegationTokenSecretManager
(AbstractDelegationTokenSecretManager.java:updateCurrentKey(223)) - Updating
the current master key for generating delegation tokens
2014-11-05 20:28:59,648 INFO [IPC Server handler 5 on 8020]
delegation.AbstractDelegationTokenSecretManager
(AbstractDelegationTokenSecretManager.java:createPassword(285)) - Creating
password for identifier: HDFS_DELEGATION_TOKEN token 1 for yarn
2014-11-05 20:28:59,701 INFO [IPC Server handler 0 on 8020]
delegation.AbstractDelegationTokenSecretManager
(AbstractDelegationTokenSecretManager.java:createPassword(285)) - Creating
password for identifier: HDFS_DELEGATION_TOKEN token 2 for yarn
2014-11-05 20:29:16,217 INFO [IPC Server handler 0 on 8020]
delegation.AbstractDelegationTokenSecretManager
(AbstractDelegationTokenSecretManager.java:renewToken(363)) - Token renewal
requested for identifier: HDFS_DELEGATION_TOKEN token 1 for yarn
2014-11-06 18:05:16,233 INFO [IPC Server handler 5 on 8020]
delegation.AbstractDelegationTokenSecretManager
(AbstractDelegationTokenSecretManager.java:renewToken(363)) - Token renewal
requested for identifier: HDFS_DELEGATION_TOKEN token 1 for yarn
2014-11-06 20:23:34,656 INFO [Thread[Thread-13,5,main]]
delegation.AbstractDelegationTokenSecretManager
(AbstractDelegationTokenSecretManager.java:updateCurrentKey(223)) - Updating
the current master key for generating delegation tokens
2014-11-06 20:23:57,326 INFO [IPC Server handler 5 on 8020]
delegation.AbstractDelegationTokenSecretManager
(AbstractDelegationTokenSecretManager.java:createPassword(285)) - Creating
password for identifier: HDFS_DELEGATION_TOKEN token 3 for yarn
2014-11-06 20:29:54,975 INFO [Socket Reader #1 for port 8020] ipc.Server
(Server.java:doRead(801)) - IPC Server listener on 8020: readAndProcess from
client 10.240.95.189 threw exception
[org.apache.hadoop.security.token.SecretManager$InvalidToken: token
(HDFS_DELEGATION_TOKEN token 2 for yarn) is expired]
2014-11-06 20:29:57,784 INFO [Socket Reader #1 for port 8020] ipc.Server
(Server.java:doRead(801)) - IPC Server listener on 8020: readAndProcess from
client 10.240.95.189 threw exception
[org.apache.hadoop.security.token.SecretManager$InvalidToken: token
(HDFS_DELEGATION_TOKEN token 2 for yarn) is expired]
2014-11-06 20:29:58,792 INFO [Socket Reader #1 for port 8020] ipc.Server
(Server.java:doRead(801)) - IPC Server listener on 8020: readAndProcess from
client 10.240.95.189 threw exception
[org.apache.hadoop.security.token.SecretManager$InvalidToken: token
(HDFS_DELEGATION_TOKEN token 2 for yarn) is expired]
2014-11-06 20:29:59,796 INFO [Socket Reader #1 for port 8020] ipc.Server
(Server.java:doRead(801)) - IPC Server listener on 8020: readAndProcess from
client 10.240.95.189 threw exception
[org.apache.hadoop.security.token.SecretManager$InvalidToken: token
(HDFS_DELEGATION_TOKEN token 2 for yarn) is expired]
2014-11-06 20:30:00,004 INFO [Socket Reader #1 for port 8020] ipc.Server
(Server.java:doRead(801)) - IPC Server listener on 8020: readAndProcess from
client 10.240.95.189 threw exception
[org.apache.hadoop.security.token.SecretManager$InvalidToken: token
(HDFS_DELEGATION_TOKEN token 2 for yarn) is expired]
2014-11-06 20:30:00,800 INFO [Socket Reader #1 for port 8020] ipc.Server
(Server.java:doRead(801)) - IPC Server listener on 8020: readAndProcess from
client 10.240.95.189 threw exception
[org.apache.hadoop.security.token.SecretManager$InvalidToken: token
(HDFS_DELEGATION_TOKEN token 2 for yarn) is expired]
2014-11-06 20:30:01,804 INFO [Socket Reader #1 for port 8020] ipc.Server
(Server.java:doRead(801)) - IPC Server listener on 8020: readAndProcess from
client 10.240.95.189 threw exception
[org.apache.hadoop.security.token.SecretManager$InvalidToken: token
(HDFS_DELEGATION_TOKEN token 2 for yarn) is expired]
{code}
> HDFS delegation token is not being refreshed properly
> -----------------------------------------------------
>
> Key: TWILL-106
> URL: https://issues.apache.org/jira/browse/TWILL-106
> Project: Apache Twill
> Issue Type: Bug
> Components: core
> Affects Versions: 0.4.0-incubating
> Reporter: Poorna Chandra
>
> We have a Twill app that runs in a secure Hadoop cluster. The app starts up
> fine, and runs for a day. I can see in logs that say secure store was updated
> regularly. However, after a day I see exceptions that say "token
> (HDFS_DELEGATION_TOKEN token 4287 for yarn) can't be found in cache".
> Exception:
> -------------
> 2014-10-23T04:12:42,101Z ERROR c.c.t.TransactionManager
> [cdap-secure120-1000.dev.continuuity.net] [tx-snapshot]
> TransactionManager:abortService(TransactionManager.java:594) - Aborting
> transaction manager due to: Snapshot (timestamp 1414037562088) failed due to:
> token (HDFS_DELEGATION_TOKEN token 4287 for yarn) can't be found in cache
> org.apache.hadoop.ipc.RemoteException: token (HDFS_DELEGATION_TOKEN token
> 4287 for yarn) can't be found in cache
> at org.apache.hadoop.ipc.Client.call(Client.java:1347)
> ...
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)