[ 
https://issues.apache.org/jira/browse/OOZIE-3723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Olson updated OOZIE-3723:
--------------------------------
    Description: 
We recently have encountered two separate issues that both required an Oozie 
service restart to resolve. In both situations it was apparent that incorrect 
workflow-supplied configuration properties related to remote FileSystem 
connectivity to support obtaining HDFS credentials for remote clusters (via 
{{{}mapreduce.job.hdfs-servers{}}}) are being retained permanently within some 
kind of cache in the Oozie service or underlying Hadoop code. These cached 
values are superseding corrected values after the workflow configuration is 
fixed, giving us no known way to fix the problem without restarting the Oozie 
service. We confirmed that the {{hdfs-site.xml}} and {{oozie-site.xml}} files 
where Oozie is running had not been updated since the prior restart, so not a 
basic case of stale configuration.

We are running Oozie version 5.2.0 in this environment.

Complete stack traces are provided below.

Issue 1:

A workflow incorrectly set {{dfs.namenode.kerberos.principal.pattern}} to 
{{{}{*}*/*{*}@*{*}{*}{}}}, but our system default is {{{}*{}}}.
{noformat}
org.apache.oozie.action.ActionExecutorException: JA009: Couldn't set up IO 
streams: java.lang.IllegalArgumentException: Server has invalid Kerberos 
principal: nn/hostname.some.domain....@kerberos.realm.com, doesn't match the 
pattern: '*/*@*'
        at 
org.apache.oozie.action.ActionExecutor.convertExceptionHelper(ActionExecutor.java:463)
        at 
org.apache.oozie.action.ActionExecutor.convertException(ActionExecutor.java:443)
        at 
org.apache.oozie.action.hadoop.JavaActionExecutor.submitLauncher(JavaActionExecutor.java:1134)
        at 
org.apache.oozie.action.hadoop.JavaActionExecutor.start(JavaActionExecutor.java:1644)
        at 
org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:243)
        at 
org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:68)
        at org.apache.oozie.command.XCommand.call(XCommand.java:290)
        at 
org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:363)
        at 
org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:292)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at 
org.apache.oozie.service.CallableQueueService$CallableWrapper.run(CallableQueueService.java:210)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:750)
Caused by: java.io.IOException: Couldn't set up IO streams: 
java.lang.IllegalArgumentException: Server has invalid Kerberos principal: 
nn/hostname.some.domain....@kerberos.realm.com, doesn't match the pattern: 
'*/*@*'
        at 
org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:894)
        at org.apache.hadoop.ipc.Client$Connection.access$3800(Client.java:414)
        at org.apache.hadoop.ipc.Client.getConnection(Client.java:1677)
        at org.apache.hadoop.ipc.Client.call(Client.java:1502)
        at org.apache.hadoop.ipc.Client.call(Client.java:1455)
        at 
org.apache.hadoop.ipc.ProtobufRpcEngine2$Invoker.invoke(ProtobufRpcEngine2.java:242)
        at 
org.apache.hadoop.ipc.ProtobufRpcEngine2$Invoker.invoke(ProtobufRpcEngine2.java:129)
        at com.sun.proxy.$Proxy35.getDelegationToken(Unknown Source)
        at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getDelegationToken(ClientNamenodeProtocolTranslatorPB.java:1134)
        at sun.reflect.GeneratedMethodAccessor31.invoke(Unknown Source)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422)
        at 
org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165)
        at 
org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157)
        at 
org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95)
        at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359)
        at com.sun.proxy.$Proxy36.getDelegationToken(Unknown Source)
        at 
org.apache.hadoop.hdfs.DFSClient.getDelegationToken(DFSClient.java:734)
        at 
org.apache.hadoop.hdfs.DistributedFileSystem.getDelegationToken(DistributedFileSystem.java:1996)
        at 
org.apache.hadoop.security.token.DelegationTokenIssuer.collectDelegationTokens(DelegationTokenIssuer.java:95)
        at 
org.apache.hadoop.security.token.DelegationTokenIssuer.addDelegationTokens(DelegationTokenIssuer.java:76)
        at 
org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:143)
        at 
org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:102)
        at 
org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodes(TokenCache.java:81)
        at 
org.apache.oozie.action.hadoop.HDFSCredentials$1.run(HDFSCredentials.java:103)
        at 
org.apache.oozie.action.hadoop.HDFSCredentials$1.run(HDFSCredentials.java:100)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1878)
        at 
org.apache.oozie.action.hadoop.HDFSCredentials.obtainTokensForNamenodes(HDFSCredentials.java:99)
        at 
org.apache.oozie.action.hadoop.HDFSCredentials.updateCredentials(HDFSCredentials.java:65)
        at 
org.apache.oozie.action.hadoop.JavaActionExecutor.setCredentialTokens(JavaActionExecutor.java:1546)
        at 
org.apache.oozie.action.hadoop.JavaActionExecutor.submitLauncher(JavaActionExecutor.java:1082)
        ... 11 more
Caused by: java.lang.IllegalArgumentException: Server has invalid Kerberos 
principal: nn/hostname.some.domain....@kerberos.realm.com, doesn't match the 
pattern: '*/*@*'
        at 
org.apache.hadoop.security.SaslRpcClient.getServerPrincipal(SaslRpcClient.java:319)
        at 
org.apache.hadoop.security.SaslRpcClient.createSaslClient(SaslRpcClient.java:240)
        at 
org.apache.hadoop.security.SaslRpcClient.selectSaslClient(SaslRpcClient.java:166)
        at 
org.apache.hadoop.security.SaslRpcClient.saslConnect(SaslRpcClient.java:392)
        at 
org.apache.hadoop.ipc.Client$Connection.setupSaslConnection(Client.java:623)
        at org.apache.hadoop.ipc.Client$Connection.access$2300(Client.java:414)
        at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:843)
        at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:839)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1878)
        at 
org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:839)
        ... 44 more
{noformat}
Issue 2:

A workflow incorrectly got FQDNs mixed up, setting 
{{dfs.namenode.rpc-address.cluster.nn1}} = {{hostname.another.domain.com:8020}} 
instead of {{{}hostname.some.domain.com:8020{}}}.
{noformat}
org.apache.oozie.action.ActionExecutorException: JA001: Invalid host name: 
local host is: "hostname.some.domain.com/10.1.2.3"; destination host 
is: "hostname.another.domain.com":8020; 
java.net.UnknownHostException; For more details see:  
http://wiki.apache.org/hadoop/UnknownHost
        at 
org.apache.oozie.action.ActionExecutor.convertExceptionHelper(ActionExecutor.java:463)
        at 
org.apache.oozie.action.ActionExecutor.convertException(ActionExecutor.java:443)
        at 
org.apache.oozie.action.hadoop.JavaActionExecutor.submitLauncher(JavaActionExecutor.java:1134)
        at 
org.apache.oozie.action.hadoop.JavaActionExecutor.start(JavaActionExecutor.java:1644)
        at 
org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:243)
        at 
org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:68)
        at org.apache.oozie.command.XCommand.call(XCommand.java:290)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at 
org.apache.oozie.service.CallableQueueService$CallableWrapper.run(CallableQueueService.java:210)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:750)
Caused by: java.net.UnknownHostException: Invalid host name: local host is: 
"hostname.some.domain.com/10.1.2.3"; destination host is: 
"hostname.another.domain.com":8020; java.net.UnknownHostException; 
For more details see:  http://wiki.apache.org/hadoop/UnknownHost
        at sun.reflect.GeneratedConstructorAccessor185.newInstance(Unknown 
Source)
        at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
        at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:913)
        at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:841)
        at 
org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:662)
        at 
org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:833)
        at org.apache.hadoop.ipc.Client$Connection.access$3800(Client.java:414)
        at org.apache.hadoop.ipc.Client.getConnection(Client.java:1677)
        at org.apache.hadoop.ipc.Client.call(Client.java:1502)
        at org.apache.hadoop.ipc.Client.call(Client.java:1455)
        at 
org.apache.hadoop.ipc.ProtobufRpcEngine2$Invoker.invoke(ProtobufRpcEngine2.java:242)
        at 
org.apache.hadoop.ipc.ProtobufRpcEngine2$Invoker.invoke(ProtobufRpcEngine2.java:129)
        at com.sun.proxy.$Proxy35.getDelegationToken(Unknown Source)
        at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getDelegationToken(ClientNamenodeProtocolTranslatorPB.java:1134)
        at sun.reflect.GeneratedMethodAccessor59.invoke(Unknown Source)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422)
        at 
org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165)
        at 
org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157)
        at 
org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95)
        at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359)
        at com.sun.proxy.$Proxy36.getDelegationToken(Unknown Source)
        at 
org.apache.hadoop.hdfs.DFSClient.getDelegationToken(DFSClient.java:734)
        at 
org.apache.hadoop.hdfs.DistributedFileSystem.getDelegationToken(DistributedFileSystem.java:1996)
        at 
org.apache.hadoop.security.token.DelegationTokenIssuer.collectDelegationTokens(DelegationTokenIssuer.java:95)
        at 
org.apache.hadoop.security.token.DelegationTokenIssuer.addDelegationTokens(DelegationTokenIssuer.java:76)
        at 
org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:143)
        at 
org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:102)
        at 
org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodes(TokenCache.java:81)
        at 
org.apache.oozie.action.hadoop.HDFSCredentials$1.run(HDFSCredentials.java:103)
        at 
org.apache.oozie.action.hadoop.HDFSCredentials$1.run(HDFSCredentials.java:100)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1878)
        at 
org.apache.oozie.action.hadoop.HDFSCredentials.obtainTokensForNamenodes(HDFSCredentials.java:99)
        at 
org.apache.oozie.action.hadoop.HDFSCredentials.updateCredentials(HDFSCredentials.java:65)
        at 
org.apache.oozie.action.hadoop.JavaActionExecutor.setCredentialTokens(JavaActionExecutor.java:1546)
        at 
org.apache.oozie.action.hadoop.JavaActionExecutor.submitLauncher(JavaActionExecutor.java:1082)
        ... 9 more
Caused by: java.net.UnknownHostException
        at 
org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:664)
        ... 43 more
{noformat}

  was:
We recently have encountered two separate issues that both required an Oozie 
service restart to resolve. In both situations it was apparent that incorrect 
workflow-supplied configuration properties related to remote FileSystem 
connectivity to support obtaining HDFS credentials for remote clusters (via 
{{{}mapreduce.job.hdfs-servers{}}}) are being retained permanently within some 
kind of cache in the Oozie service or underlying Hadoop code. These cached 
values are superseding corrected values after the workflow configuration is 
fixed, giving us no known way to fix the problem without restarting the Oozie 
service. We confirmed that the {{hdfs-site.xml}} and {{oozie-site.xml}} files 
where Oozie is running had not been updated since the prior restart, so not a 
basic case of stale configuration.

We are running Oozie version 5.2.0 in this environment.

Complete stack traces are provided below.

Issue 1:

A workflow incorrectly set {{dfs.namenode.kerberos.principal.pattern}} to our 
{{{}*/*@*{}}}, but our system default is {{{}*{}}}.
{noformat}
org.apache.oozie.action.ActionExecutorException: JA009: Couldn't set up IO 
streams: java.lang.IllegalArgumentException: Server has invalid Kerberos 
principal: nn/hostname.some.domain....@kerberos.realm.com, doesn't match the 
pattern: '*/*@*'
        at 
org.apache.oozie.action.ActionExecutor.convertExceptionHelper(ActionExecutor.java:463)
        at 
org.apache.oozie.action.ActionExecutor.convertException(ActionExecutor.java:443)
        at 
org.apache.oozie.action.hadoop.JavaActionExecutor.submitLauncher(JavaActionExecutor.java:1134)
        at 
org.apache.oozie.action.hadoop.JavaActionExecutor.start(JavaActionExecutor.java:1644)
        at 
org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:243)
        at 
org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:68)
        at org.apache.oozie.command.XCommand.call(XCommand.java:290)
        at 
org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:363)
        at 
org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:292)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at 
org.apache.oozie.service.CallableQueueService$CallableWrapper.run(CallableQueueService.java:210)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:750)
Caused by: java.io.IOException: Couldn't set up IO streams: 
java.lang.IllegalArgumentException: Server has invalid Kerberos principal: 
nn/hostname.some.domain....@kerberos.realm.com, doesn't match the pattern: 
'*/*@*'
        at 
org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:894)
        at org.apache.hadoop.ipc.Client$Connection.access$3800(Client.java:414)
        at org.apache.hadoop.ipc.Client.getConnection(Client.java:1677)
        at org.apache.hadoop.ipc.Client.call(Client.java:1502)
        at org.apache.hadoop.ipc.Client.call(Client.java:1455)
        at 
org.apache.hadoop.ipc.ProtobufRpcEngine2$Invoker.invoke(ProtobufRpcEngine2.java:242)
        at 
org.apache.hadoop.ipc.ProtobufRpcEngine2$Invoker.invoke(ProtobufRpcEngine2.java:129)
        at com.sun.proxy.$Proxy35.getDelegationToken(Unknown Source)
        at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getDelegationToken(ClientNamenodeProtocolTranslatorPB.java:1134)
        at sun.reflect.GeneratedMethodAccessor31.invoke(Unknown Source)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422)
        at 
org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165)
        at 
org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157)
        at 
org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95)
        at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359)
        at com.sun.proxy.$Proxy36.getDelegationToken(Unknown Source)
        at 
org.apache.hadoop.hdfs.DFSClient.getDelegationToken(DFSClient.java:734)
        at 
org.apache.hadoop.hdfs.DistributedFileSystem.getDelegationToken(DistributedFileSystem.java:1996)
        at 
org.apache.hadoop.security.token.DelegationTokenIssuer.collectDelegationTokens(DelegationTokenIssuer.java:95)
        at 
org.apache.hadoop.security.token.DelegationTokenIssuer.addDelegationTokens(DelegationTokenIssuer.java:76)
        at 
org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:143)
        at 
org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:102)
        at 
org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodes(TokenCache.java:81)
        at 
org.apache.oozie.action.hadoop.HDFSCredentials$1.run(HDFSCredentials.java:103)
        at 
org.apache.oozie.action.hadoop.HDFSCredentials$1.run(HDFSCredentials.java:100)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1878)
        at 
org.apache.oozie.action.hadoop.HDFSCredentials.obtainTokensForNamenodes(HDFSCredentials.java:99)
        at 
org.apache.oozie.action.hadoop.HDFSCredentials.updateCredentials(HDFSCredentials.java:65)
        at 
org.apache.oozie.action.hadoop.JavaActionExecutor.setCredentialTokens(JavaActionExecutor.java:1546)
        at 
org.apache.oozie.action.hadoop.JavaActionExecutor.submitLauncher(JavaActionExecutor.java:1082)
        ... 11 more
Caused by: java.lang.IllegalArgumentException: Server has invalid Kerberos 
principal: nn/hostname.some.domain....@kerberos.realm.com, doesn't match the 
pattern: '*/*@*'
        at 
org.apache.hadoop.security.SaslRpcClient.getServerPrincipal(SaslRpcClient.java:319)
        at 
org.apache.hadoop.security.SaslRpcClient.createSaslClient(SaslRpcClient.java:240)
        at 
org.apache.hadoop.security.SaslRpcClient.selectSaslClient(SaslRpcClient.java:166)
        at 
org.apache.hadoop.security.SaslRpcClient.saslConnect(SaslRpcClient.java:392)
        at 
org.apache.hadoop.ipc.Client$Connection.setupSaslConnection(Client.java:623)
        at org.apache.hadoop.ipc.Client$Connection.access$2300(Client.java:414)
        at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:843)
        at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:839)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1878)
        at 
org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:839)
        ... 44 more
{noformat}
Issue 2:

A workflow incorrectly got FQDNs mixed up, setting 
{{dfs.namenode.rpc-address.cluster.nn1}} = {{hostname.another.domain.com:8020}} 
instead of {{{}hostname.some.domain.com:8020{}}}.
{noformat}
org.apache.oozie.action.ActionExecutorException: JA001: Invalid host name: 
local host is: "hostname.some.domain.com/10.1.2.3"; destination host 
is: "hostname.another.domain.com":8020; 
java.net.UnknownHostException; For more details see:  
http://wiki.apache.org/hadoop/UnknownHost
        at 
org.apache.oozie.action.ActionExecutor.convertExceptionHelper(ActionExecutor.java:463)
        at 
org.apache.oozie.action.ActionExecutor.convertException(ActionExecutor.java:443)
        at 
org.apache.oozie.action.hadoop.JavaActionExecutor.submitLauncher(JavaActionExecutor.java:1134)
        at 
org.apache.oozie.action.hadoop.JavaActionExecutor.start(JavaActionExecutor.java:1644)
        at 
org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:243)
        at 
org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:68)
        at org.apache.oozie.command.XCommand.call(XCommand.java:290)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at 
org.apache.oozie.service.CallableQueueService$CallableWrapper.run(CallableQueueService.java:210)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:750)
Caused by: java.net.UnknownHostException: Invalid host name: local host is: 
"hostname.some.domain.com/10.1.2.3"; destination host is: 
"hostname.another.domain.com":8020; java.net.UnknownHostException; 
For more details see:  http://wiki.apache.org/hadoop/UnknownHost
        at sun.reflect.GeneratedConstructorAccessor185.newInstance(Unknown 
Source)
        at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
        at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:913)
        at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:841)
        at 
org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:662)
        at 
org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:833)
        at org.apache.hadoop.ipc.Client$Connection.access$3800(Client.java:414)
        at org.apache.hadoop.ipc.Client.getConnection(Client.java:1677)
        at org.apache.hadoop.ipc.Client.call(Client.java:1502)
        at org.apache.hadoop.ipc.Client.call(Client.java:1455)
        at 
org.apache.hadoop.ipc.ProtobufRpcEngine2$Invoker.invoke(ProtobufRpcEngine2.java:242)
        at 
org.apache.hadoop.ipc.ProtobufRpcEngine2$Invoker.invoke(ProtobufRpcEngine2.java:129)
        at com.sun.proxy.$Proxy35.getDelegationToken(Unknown Source)
        at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getDelegationToken(ClientNamenodeProtocolTranslatorPB.java:1134)
        at sun.reflect.GeneratedMethodAccessor59.invoke(Unknown Source)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422)
        at 
org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165)
        at 
org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157)
        at 
org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95)
        at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359)
        at com.sun.proxy.$Proxy36.getDelegationToken(Unknown Source)
        at 
org.apache.hadoop.hdfs.DFSClient.getDelegationToken(DFSClient.java:734)
        at 
org.apache.hadoop.hdfs.DistributedFileSystem.getDelegationToken(DistributedFileSystem.java:1996)
        at 
org.apache.hadoop.security.token.DelegationTokenIssuer.collectDelegationTokens(DelegationTokenIssuer.java:95)
        at 
org.apache.hadoop.security.token.DelegationTokenIssuer.addDelegationTokens(DelegationTokenIssuer.java:76)
        at 
org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:143)
        at 
org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:102)
        at 
org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodes(TokenCache.java:81)
        at 
org.apache.oozie.action.hadoop.HDFSCredentials$1.run(HDFSCredentials.java:103)
        at 
org.apache.oozie.action.hadoop.HDFSCredentials$1.run(HDFSCredentials.java:100)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1878)
        at 
org.apache.oozie.action.hadoop.HDFSCredentials.obtainTokensForNamenodes(HDFSCredentials.java:99)
        at 
org.apache.oozie.action.hadoop.HDFSCredentials.updateCredentials(HDFSCredentials.java:65)
        at 
org.apache.oozie.action.hadoop.JavaActionExecutor.setCredentialTokens(JavaActionExecutor.java:1546)
        at 
org.apache.oozie.action.hadoop.JavaActionExecutor.submitLauncher(JavaActionExecutor.java:1082)
        ... 9 more
Caused by: java.net.UnknownHostException
        at 
org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:664)
        ... 43 more
{noformat}


> Oozie service permanently caches workflow-supplied FileSystem connectivity 
> configuration properties for obtaining HDFS Credentials until restarted
> --------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: OOZIE-3723
>                 URL: https://issues.apache.org/jira/browse/OOZIE-3723
>             Project: Oozie
>          Issue Type: Bug
>          Components: workflow
>            Reporter: Andrew Olson
>            Priority: Major
>
> We recently have encountered two separate issues that both required an Oozie 
> service restart to resolve. In both situations it was apparent that incorrect 
> workflow-supplied configuration properties related to remote FileSystem 
> connectivity to support obtaining HDFS credentials for remote clusters (via 
> {{{}mapreduce.job.hdfs-servers{}}}) are being retained permanently within 
> some kind of cache in the Oozie service or underlying Hadoop code. These 
> cached values are superseding corrected values after the workflow 
> configuration is fixed, giving us no known way to fix the problem without 
> restarting the Oozie service. We confirmed that the {{hdfs-site.xml}} and 
> {{oozie-site.xml}} files where Oozie is running had not been updated since 
> the prior restart, so not a basic case of stale configuration.
> We are running Oozie version 5.2.0 in this environment.
> Complete stack traces are provided below.
> Issue 1:
> A workflow incorrectly set {{dfs.namenode.kerberos.principal.pattern}} to 
> {{{}{*}*/*{*}@*{*}{*}{}}}, but our system default is {{{}*{}}}.
> {noformat}
> org.apache.oozie.action.ActionExecutorException: JA009: Couldn't set up IO 
> streams: java.lang.IllegalArgumentException: Server has invalid Kerberos 
> principal: nn/hostname.some.domain....@kerberos.realm.com, doesn't match the 
> pattern: '*/*@*'
>       at 
> org.apache.oozie.action.ActionExecutor.convertExceptionHelper(ActionExecutor.java:463)
>       at 
> org.apache.oozie.action.ActionExecutor.convertException(ActionExecutor.java:443)
>       at 
> org.apache.oozie.action.hadoop.JavaActionExecutor.submitLauncher(JavaActionExecutor.java:1134)
>       at 
> org.apache.oozie.action.hadoop.JavaActionExecutor.start(JavaActionExecutor.java:1644)
>       at 
> org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:243)
>       at 
> org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:68)
>       at org.apache.oozie.command.XCommand.call(XCommand.java:290)
>       at 
> org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:363)
>       at 
> org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:292)
>       at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>       at 
> org.apache.oozie.service.CallableQueueService$CallableWrapper.run(CallableQueueService.java:210)
>       at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>       at java.lang.Thread.run(Thread.java:750)
> Caused by: java.io.IOException: Couldn't set up IO streams: 
> java.lang.IllegalArgumentException: Server has invalid Kerberos principal: 
> nn/hostname.some.domain....@kerberos.realm.com, doesn't match the pattern: 
> '*/*@*'
>       at 
> org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:894)
>       at org.apache.hadoop.ipc.Client$Connection.access$3800(Client.java:414)
>       at org.apache.hadoop.ipc.Client.getConnection(Client.java:1677)
>       at org.apache.hadoop.ipc.Client.call(Client.java:1502)
>       at org.apache.hadoop.ipc.Client.call(Client.java:1455)
>       at 
> org.apache.hadoop.ipc.ProtobufRpcEngine2$Invoker.invoke(ProtobufRpcEngine2.java:242)
>       at 
> org.apache.hadoop.ipc.ProtobufRpcEngine2$Invoker.invoke(ProtobufRpcEngine2.java:129)
>       at com.sun.proxy.$Proxy35.getDelegationToken(Unknown Source)
>       at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getDelegationToken(ClientNamenodeProtocolTranslatorPB.java:1134)
>       at sun.reflect.GeneratedMethodAccessor31.invoke(Unknown Source)
>       at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>       at java.lang.reflect.Method.invoke(Method.java:498)
>       at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422)
>       at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165)
>       at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157)
>       at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95)
>       at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359)
>       at com.sun.proxy.$Proxy36.getDelegationToken(Unknown Source)
>       at 
> org.apache.hadoop.hdfs.DFSClient.getDelegationToken(DFSClient.java:734)
>       at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getDelegationToken(DistributedFileSystem.java:1996)
>       at 
> org.apache.hadoop.security.token.DelegationTokenIssuer.collectDelegationTokens(DelegationTokenIssuer.java:95)
>       at 
> org.apache.hadoop.security.token.DelegationTokenIssuer.addDelegationTokens(DelegationTokenIssuer.java:76)
>       at 
> org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:143)
>       at 
> org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:102)
>       at 
> org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodes(TokenCache.java:81)
>       at 
> org.apache.oozie.action.hadoop.HDFSCredentials$1.run(HDFSCredentials.java:103)
>       at 
> org.apache.oozie.action.hadoop.HDFSCredentials$1.run(HDFSCredentials.java:100)
>       at java.security.AccessController.doPrivileged(Native Method)
>       at javax.security.auth.Subject.doAs(Subject.java:422)
>       at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1878)
>       at 
> org.apache.oozie.action.hadoop.HDFSCredentials.obtainTokensForNamenodes(HDFSCredentials.java:99)
>       at 
> org.apache.oozie.action.hadoop.HDFSCredentials.updateCredentials(HDFSCredentials.java:65)
>       at 
> org.apache.oozie.action.hadoop.JavaActionExecutor.setCredentialTokens(JavaActionExecutor.java:1546)
>       at 
> org.apache.oozie.action.hadoop.JavaActionExecutor.submitLauncher(JavaActionExecutor.java:1082)
>       ... 11 more
> Caused by: java.lang.IllegalArgumentException: Server has invalid Kerberos 
> principal: nn/hostname.some.domain....@kerberos.realm.com, doesn't match the 
> pattern: '*/*@*'
>       at 
> org.apache.hadoop.security.SaslRpcClient.getServerPrincipal(SaslRpcClient.java:319)
>       at 
> org.apache.hadoop.security.SaslRpcClient.createSaslClient(SaslRpcClient.java:240)
>       at 
> org.apache.hadoop.security.SaslRpcClient.selectSaslClient(SaslRpcClient.java:166)
>       at 
> org.apache.hadoop.security.SaslRpcClient.saslConnect(SaslRpcClient.java:392)
>       at 
> org.apache.hadoop.ipc.Client$Connection.setupSaslConnection(Client.java:623)
>       at org.apache.hadoop.ipc.Client$Connection.access$2300(Client.java:414)
>       at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:843)
>       at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:839)
>       at java.security.AccessController.doPrivileged(Native Method)
>       at javax.security.auth.Subject.doAs(Subject.java:422)
>       at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1878)
>       at 
> org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:839)
>       ... 44 more
> {noformat}
> Issue 2:
> A workflow incorrectly got FQDNs mixed up, setting 
> {{dfs.namenode.rpc-address.cluster.nn1}} = 
> {{hostname.another.domain.com:8020}} instead of 
> {{{}hostname.some.domain.com:8020{}}}.
> {noformat}
> org.apache.oozie.action.ActionExecutorException: JA001: Invalid host name: 
> local host is: "hostname.some.domain.com/10.1.2.3"; destination 
> host is: "hostname.another.domain.com":8020; 
> java.net.UnknownHostException; For more details see:  
> http://wiki.apache.org/hadoop/UnknownHost
>       at 
> org.apache.oozie.action.ActionExecutor.convertExceptionHelper(ActionExecutor.java:463)
>       at 
> org.apache.oozie.action.ActionExecutor.convertException(ActionExecutor.java:443)
>       at 
> org.apache.oozie.action.hadoop.JavaActionExecutor.submitLauncher(JavaActionExecutor.java:1134)
>       at 
> org.apache.oozie.action.hadoop.JavaActionExecutor.start(JavaActionExecutor.java:1644)
>       at 
> org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:243)
>       at 
> org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:68)
>       at org.apache.oozie.command.XCommand.call(XCommand.java:290)
>       at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>       at 
> org.apache.oozie.service.CallableQueueService$CallableWrapper.run(CallableQueueService.java:210)
>       at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>       at java.lang.Thread.run(Thread.java:750)
> Caused by: java.net.UnknownHostException: Invalid host name: local host is: 
> "hostname.some.domain.com/10.1.2.3"; destination host is: 
> "hostname.another.domain.com":8020; java.net.UnknownHostException; 
> For more details see:  http://wiki.apache.org/hadoop/UnknownHost
>       at sun.reflect.GeneratedConstructorAccessor185.newInstance(Unknown 
> Source)
>       at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>       at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>       at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:913)
>       at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:841)
>       at 
> org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:662)
>       at 
> org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:833)
>       at org.apache.hadoop.ipc.Client$Connection.access$3800(Client.java:414)
>       at org.apache.hadoop.ipc.Client.getConnection(Client.java:1677)
>       at org.apache.hadoop.ipc.Client.call(Client.java:1502)
>       at org.apache.hadoop.ipc.Client.call(Client.java:1455)
>       at 
> org.apache.hadoop.ipc.ProtobufRpcEngine2$Invoker.invoke(ProtobufRpcEngine2.java:242)
>       at 
> org.apache.hadoop.ipc.ProtobufRpcEngine2$Invoker.invoke(ProtobufRpcEngine2.java:129)
>       at com.sun.proxy.$Proxy35.getDelegationToken(Unknown Source)
>       at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getDelegationToken(ClientNamenodeProtocolTranslatorPB.java:1134)
>       at sun.reflect.GeneratedMethodAccessor59.invoke(Unknown Source)
>       at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>       at java.lang.reflect.Method.invoke(Method.java:498)
>       at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422)
>       at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165)
>       at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157)
>       at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95)
>       at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359)
>       at com.sun.proxy.$Proxy36.getDelegationToken(Unknown Source)
>       at 
> org.apache.hadoop.hdfs.DFSClient.getDelegationToken(DFSClient.java:734)
>       at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getDelegationToken(DistributedFileSystem.java:1996)
>       at 
> org.apache.hadoop.security.token.DelegationTokenIssuer.collectDelegationTokens(DelegationTokenIssuer.java:95)
>       at 
> org.apache.hadoop.security.token.DelegationTokenIssuer.addDelegationTokens(DelegationTokenIssuer.java:76)
>       at 
> org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:143)
>       at 
> org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:102)
>       at 
> org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodes(TokenCache.java:81)
>       at 
> org.apache.oozie.action.hadoop.HDFSCredentials$1.run(HDFSCredentials.java:103)
>       at 
> org.apache.oozie.action.hadoop.HDFSCredentials$1.run(HDFSCredentials.java:100)
>       at java.security.AccessController.doPrivileged(Native Method)
>       at javax.security.auth.Subject.doAs(Subject.java:422)
>       at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1878)
>       at 
> org.apache.oozie.action.hadoop.HDFSCredentials.obtainTokensForNamenodes(HDFSCredentials.java:99)
>       at 
> org.apache.oozie.action.hadoop.HDFSCredentials.updateCredentials(HDFSCredentials.java:65)
>       at 
> org.apache.oozie.action.hadoop.JavaActionExecutor.setCredentialTokens(JavaActionExecutor.java:1546)
>       at 
> org.apache.oozie.action.hadoop.JavaActionExecutor.submitLauncher(JavaActionExecutor.java:1082)
>       ... 9 more
> Caused by: java.net.UnknownHostException
>       at 
> org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:664)
>       ... 43 more
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to