[ https://issues.apache.org/jira/browse/OOZIE-3723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Andrew Olson updated OOZIE-3723: -------------------------------- Description: We recently have encountered two separate issues that both required an Oozie service restart to resolve. In both situations it was apparent that incorrect workflow-supplied configuration properties related to remote FileSystem connectivity to support obtaining HDFS credentials for remote clusters (via {{{}mapreduce.job.hdfs-servers{}}}) are being retained permanently within some kind of cache in the Oozie service or underlying Hadoop code. These cached values are superseding corrected values after the workflow configuration is fixed, giving us no known way to fix the problem without restarting the Oozie service. We confirmed that the {{hdfs-site.xml}} and {{oozie-site.xml}} files where Oozie is running had not been updated since the prior restart, so not a basic case of stale configuration. We are running Oozie version 5.2.0 in this environment. Complete stack traces are provided below. Issue 1: A workflow incorrectly set {{dfs.namenode.kerberos.principal.pattern}} to {{{}{*}*/*{*}@*{*}{*}{}}}, but our system default is {{{}*{}}}. {noformat} org.apache.oozie.action.ActionExecutorException: JA009: Couldn't set up IO streams: java.lang.IllegalArgumentException: Server has invalid Kerberos principal: nn/hostname.some.domain....@kerberos.realm.com, doesn't match the pattern: '*/*@*' at org.apache.oozie.action.ActionExecutor.convertExceptionHelper(ActionExecutor.java:463) at org.apache.oozie.action.ActionExecutor.convertException(ActionExecutor.java:443) at org.apache.oozie.action.hadoop.JavaActionExecutor.submitLauncher(JavaActionExecutor.java:1134) at org.apache.oozie.action.hadoop.JavaActionExecutor.start(JavaActionExecutor.java:1644) at org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:243) at org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:68) at org.apache.oozie.command.XCommand.call(XCommand.java:290) at org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:363) at org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:292) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at org.apache.oozie.service.CallableQueueService$CallableWrapper.run(CallableQueueService.java:210) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:750) Caused by: java.io.IOException: Couldn't set up IO streams: java.lang.IllegalArgumentException: Server has invalid Kerberos principal: nn/hostname.some.domain....@kerberos.realm.com, doesn't match the pattern: '*/*@*' at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:894) at org.apache.hadoop.ipc.Client$Connection.access$3800(Client.java:414) at org.apache.hadoop.ipc.Client.getConnection(Client.java:1677) at org.apache.hadoop.ipc.Client.call(Client.java:1502) at org.apache.hadoop.ipc.Client.call(Client.java:1455) at org.apache.hadoop.ipc.ProtobufRpcEngine2$Invoker.invoke(ProtobufRpcEngine2.java:242) at org.apache.hadoop.ipc.ProtobufRpcEngine2$Invoker.invoke(ProtobufRpcEngine2.java:129) at com.sun.proxy.$Proxy35.getDelegationToken(Unknown Source) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getDelegationToken(ClientNamenodeProtocolTranslatorPB.java:1134) at sun.reflect.GeneratedMethodAccessor31.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422) at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165) at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157) at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359) at com.sun.proxy.$Proxy36.getDelegationToken(Unknown Source) at org.apache.hadoop.hdfs.DFSClient.getDelegationToken(DFSClient.java:734) at org.apache.hadoop.hdfs.DistributedFileSystem.getDelegationToken(DistributedFileSystem.java:1996) at org.apache.hadoop.security.token.DelegationTokenIssuer.collectDelegationTokens(DelegationTokenIssuer.java:95) at org.apache.hadoop.security.token.DelegationTokenIssuer.addDelegationTokens(DelegationTokenIssuer.java:76) at org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:143) at org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:102) at org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodes(TokenCache.java:81) at org.apache.oozie.action.hadoop.HDFSCredentials$1.run(HDFSCredentials.java:103) at org.apache.oozie.action.hadoop.HDFSCredentials$1.run(HDFSCredentials.java:100) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1878) at org.apache.oozie.action.hadoop.HDFSCredentials.obtainTokensForNamenodes(HDFSCredentials.java:99) at org.apache.oozie.action.hadoop.HDFSCredentials.updateCredentials(HDFSCredentials.java:65) at org.apache.oozie.action.hadoop.JavaActionExecutor.setCredentialTokens(JavaActionExecutor.java:1546) at org.apache.oozie.action.hadoop.JavaActionExecutor.submitLauncher(JavaActionExecutor.java:1082) ... 11 more Caused by: java.lang.IllegalArgumentException: Server has invalid Kerberos principal: nn/hostname.some.domain....@kerberos.realm.com, doesn't match the pattern: '*/*@*' at org.apache.hadoop.security.SaslRpcClient.getServerPrincipal(SaslRpcClient.java:319) at org.apache.hadoop.security.SaslRpcClient.createSaslClient(SaslRpcClient.java:240) at org.apache.hadoop.security.SaslRpcClient.selectSaslClient(SaslRpcClient.java:166) at org.apache.hadoop.security.SaslRpcClient.saslConnect(SaslRpcClient.java:392) at org.apache.hadoop.ipc.Client$Connection.setupSaslConnection(Client.java:623) at org.apache.hadoop.ipc.Client$Connection.access$2300(Client.java:414) at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:843) at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:839) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1878) at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:839) ... 44 more {noformat} Issue 2: A workflow incorrectly got FQDNs mixed up, setting {{dfs.namenode.rpc-address.cluster.nn1}} = {{hostname.another.domain.com:8020}} instead of {{{}hostname.some.domain.com:8020{}}}. {noformat} org.apache.oozie.action.ActionExecutorException: JA001: Invalid host name: local host is: "hostname.some.domain.com/10.1.2.3"; destination host is: "hostname.another.domain.com":8020; java.net.UnknownHostException; For more details see: http://wiki.apache.org/hadoop/UnknownHost at org.apache.oozie.action.ActionExecutor.convertExceptionHelper(ActionExecutor.java:463) at org.apache.oozie.action.ActionExecutor.convertException(ActionExecutor.java:443) at org.apache.oozie.action.hadoop.JavaActionExecutor.submitLauncher(JavaActionExecutor.java:1134) at org.apache.oozie.action.hadoop.JavaActionExecutor.start(JavaActionExecutor.java:1644) at org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:243) at org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:68) at org.apache.oozie.command.XCommand.call(XCommand.java:290) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at org.apache.oozie.service.CallableQueueService$CallableWrapper.run(CallableQueueService.java:210) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:750) Caused by: java.net.UnknownHostException: Invalid host name: local host is: "hostname.some.domain.com/10.1.2.3"; destination host is: "hostname.another.domain.com":8020; java.net.UnknownHostException; For more details see: http://wiki.apache.org/hadoop/UnknownHost at sun.reflect.GeneratedConstructorAccessor185.newInstance(Unknown Source) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:913) at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:841) at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:662) at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:833) at org.apache.hadoop.ipc.Client$Connection.access$3800(Client.java:414) at org.apache.hadoop.ipc.Client.getConnection(Client.java:1677) at org.apache.hadoop.ipc.Client.call(Client.java:1502) at org.apache.hadoop.ipc.Client.call(Client.java:1455) at org.apache.hadoop.ipc.ProtobufRpcEngine2$Invoker.invoke(ProtobufRpcEngine2.java:242) at org.apache.hadoop.ipc.ProtobufRpcEngine2$Invoker.invoke(ProtobufRpcEngine2.java:129) at com.sun.proxy.$Proxy35.getDelegationToken(Unknown Source) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getDelegationToken(ClientNamenodeProtocolTranslatorPB.java:1134) at sun.reflect.GeneratedMethodAccessor59.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422) at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165) at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157) at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359) at com.sun.proxy.$Proxy36.getDelegationToken(Unknown Source) at org.apache.hadoop.hdfs.DFSClient.getDelegationToken(DFSClient.java:734) at org.apache.hadoop.hdfs.DistributedFileSystem.getDelegationToken(DistributedFileSystem.java:1996) at org.apache.hadoop.security.token.DelegationTokenIssuer.collectDelegationTokens(DelegationTokenIssuer.java:95) at org.apache.hadoop.security.token.DelegationTokenIssuer.addDelegationTokens(DelegationTokenIssuer.java:76) at org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:143) at org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:102) at org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodes(TokenCache.java:81) at org.apache.oozie.action.hadoop.HDFSCredentials$1.run(HDFSCredentials.java:103) at org.apache.oozie.action.hadoop.HDFSCredentials$1.run(HDFSCredentials.java:100) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1878) at org.apache.oozie.action.hadoop.HDFSCredentials.obtainTokensForNamenodes(HDFSCredentials.java:99) at org.apache.oozie.action.hadoop.HDFSCredentials.updateCredentials(HDFSCredentials.java:65) at org.apache.oozie.action.hadoop.JavaActionExecutor.setCredentialTokens(JavaActionExecutor.java:1546) at org.apache.oozie.action.hadoop.JavaActionExecutor.submitLauncher(JavaActionExecutor.java:1082) ... 9 more Caused by: java.net.UnknownHostException at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:664) ... 43 more {noformat} was: We recently have encountered two separate issues that both required an Oozie service restart to resolve. In both situations it was apparent that incorrect workflow-supplied configuration properties related to remote FileSystem connectivity to support obtaining HDFS credentials for remote clusters (via {{{}mapreduce.job.hdfs-servers{}}}) are being retained permanently within some kind of cache in the Oozie service or underlying Hadoop code. These cached values are superseding corrected values after the workflow configuration is fixed, giving us no known way to fix the problem without restarting the Oozie service. We confirmed that the {{hdfs-site.xml}} and {{oozie-site.xml}} files where Oozie is running had not been updated since the prior restart, so not a basic case of stale configuration. We are running Oozie version 5.2.0 in this environment. Complete stack traces are provided below. Issue 1: A workflow incorrectly set {{dfs.namenode.kerberos.principal.pattern}} to our {{{}*/*@*{}}}, but our system default is {{{}*{}}}. {noformat} org.apache.oozie.action.ActionExecutorException: JA009: Couldn't set up IO streams: java.lang.IllegalArgumentException: Server has invalid Kerberos principal: nn/hostname.some.domain....@kerberos.realm.com, doesn't match the pattern: '*/*@*' at org.apache.oozie.action.ActionExecutor.convertExceptionHelper(ActionExecutor.java:463) at org.apache.oozie.action.ActionExecutor.convertException(ActionExecutor.java:443) at org.apache.oozie.action.hadoop.JavaActionExecutor.submitLauncher(JavaActionExecutor.java:1134) at org.apache.oozie.action.hadoop.JavaActionExecutor.start(JavaActionExecutor.java:1644) at org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:243) at org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:68) at org.apache.oozie.command.XCommand.call(XCommand.java:290) at org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:363) at org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:292) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at org.apache.oozie.service.CallableQueueService$CallableWrapper.run(CallableQueueService.java:210) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:750) Caused by: java.io.IOException: Couldn't set up IO streams: java.lang.IllegalArgumentException: Server has invalid Kerberos principal: nn/hostname.some.domain....@kerberos.realm.com, doesn't match the pattern: '*/*@*' at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:894) at org.apache.hadoop.ipc.Client$Connection.access$3800(Client.java:414) at org.apache.hadoop.ipc.Client.getConnection(Client.java:1677) at org.apache.hadoop.ipc.Client.call(Client.java:1502) at org.apache.hadoop.ipc.Client.call(Client.java:1455) at org.apache.hadoop.ipc.ProtobufRpcEngine2$Invoker.invoke(ProtobufRpcEngine2.java:242) at org.apache.hadoop.ipc.ProtobufRpcEngine2$Invoker.invoke(ProtobufRpcEngine2.java:129) at com.sun.proxy.$Proxy35.getDelegationToken(Unknown Source) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getDelegationToken(ClientNamenodeProtocolTranslatorPB.java:1134) at sun.reflect.GeneratedMethodAccessor31.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422) at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165) at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157) at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359) at com.sun.proxy.$Proxy36.getDelegationToken(Unknown Source) at org.apache.hadoop.hdfs.DFSClient.getDelegationToken(DFSClient.java:734) at org.apache.hadoop.hdfs.DistributedFileSystem.getDelegationToken(DistributedFileSystem.java:1996) at org.apache.hadoop.security.token.DelegationTokenIssuer.collectDelegationTokens(DelegationTokenIssuer.java:95) at org.apache.hadoop.security.token.DelegationTokenIssuer.addDelegationTokens(DelegationTokenIssuer.java:76) at org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:143) at org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:102) at org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodes(TokenCache.java:81) at org.apache.oozie.action.hadoop.HDFSCredentials$1.run(HDFSCredentials.java:103) at org.apache.oozie.action.hadoop.HDFSCredentials$1.run(HDFSCredentials.java:100) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1878) at org.apache.oozie.action.hadoop.HDFSCredentials.obtainTokensForNamenodes(HDFSCredentials.java:99) at org.apache.oozie.action.hadoop.HDFSCredentials.updateCredentials(HDFSCredentials.java:65) at org.apache.oozie.action.hadoop.JavaActionExecutor.setCredentialTokens(JavaActionExecutor.java:1546) at org.apache.oozie.action.hadoop.JavaActionExecutor.submitLauncher(JavaActionExecutor.java:1082) ... 11 more Caused by: java.lang.IllegalArgumentException: Server has invalid Kerberos principal: nn/hostname.some.domain....@kerberos.realm.com, doesn't match the pattern: '*/*@*' at org.apache.hadoop.security.SaslRpcClient.getServerPrincipal(SaslRpcClient.java:319) at org.apache.hadoop.security.SaslRpcClient.createSaslClient(SaslRpcClient.java:240) at org.apache.hadoop.security.SaslRpcClient.selectSaslClient(SaslRpcClient.java:166) at org.apache.hadoop.security.SaslRpcClient.saslConnect(SaslRpcClient.java:392) at org.apache.hadoop.ipc.Client$Connection.setupSaslConnection(Client.java:623) at org.apache.hadoop.ipc.Client$Connection.access$2300(Client.java:414) at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:843) at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:839) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1878) at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:839) ... 44 more {noformat} Issue 2: A workflow incorrectly got FQDNs mixed up, setting {{dfs.namenode.rpc-address.cluster.nn1}} = {{hostname.another.domain.com:8020}} instead of {{{}hostname.some.domain.com:8020{}}}. {noformat} org.apache.oozie.action.ActionExecutorException: JA001: Invalid host name: local host is: "hostname.some.domain.com/10.1.2.3"; destination host is: "hostname.another.domain.com":8020; java.net.UnknownHostException; For more details see: http://wiki.apache.org/hadoop/UnknownHost at org.apache.oozie.action.ActionExecutor.convertExceptionHelper(ActionExecutor.java:463) at org.apache.oozie.action.ActionExecutor.convertException(ActionExecutor.java:443) at org.apache.oozie.action.hadoop.JavaActionExecutor.submitLauncher(JavaActionExecutor.java:1134) at org.apache.oozie.action.hadoop.JavaActionExecutor.start(JavaActionExecutor.java:1644) at org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:243) at org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:68) at org.apache.oozie.command.XCommand.call(XCommand.java:290) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at org.apache.oozie.service.CallableQueueService$CallableWrapper.run(CallableQueueService.java:210) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:750) Caused by: java.net.UnknownHostException: Invalid host name: local host is: "hostname.some.domain.com/10.1.2.3"; destination host is: "hostname.another.domain.com":8020; java.net.UnknownHostException; For more details see: http://wiki.apache.org/hadoop/UnknownHost at sun.reflect.GeneratedConstructorAccessor185.newInstance(Unknown Source) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:913) at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:841) at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:662) at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:833) at org.apache.hadoop.ipc.Client$Connection.access$3800(Client.java:414) at org.apache.hadoop.ipc.Client.getConnection(Client.java:1677) at org.apache.hadoop.ipc.Client.call(Client.java:1502) at org.apache.hadoop.ipc.Client.call(Client.java:1455) at org.apache.hadoop.ipc.ProtobufRpcEngine2$Invoker.invoke(ProtobufRpcEngine2.java:242) at org.apache.hadoop.ipc.ProtobufRpcEngine2$Invoker.invoke(ProtobufRpcEngine2.java:129) at com.sun.proxy.$Proxy35.getDelegationToken(Unknown Source) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getDelegationToken(ClientNamenodeProtocolTranslatorPB.java:1134) at sun.reflect.GeneratedMethodAccessor59.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422) at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165) at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157) at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359) at com.sun.proxy.$Proxy36.getDelegationToken(Unknown Source) at org.apache.hadoop.hdfs.DFSClient.getDelegationToken(DFSClient.java:734) at org.apache.hadoop.hdfs.DistributedFileSystem.getDelegationToken(DistributedFileSystem.java:1996) at org.apache.hadoop.security.token.DelegationTokenIssuer.collectDelegationTokens(DelegationTokenIssuer.java:95) at org.apache.hadoop.security.token.DelegationTokenIssuer.addDelegationTokens(DelegationTokenIssuer.java:76) at org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:143) at org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:102) at org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodes(TokenCache.java:81) at org.apache.oozie.action.hadoop.HDFSCredentials$1.run(HDFSCredentials.java:103) at org.apache.oozie.action.hadoop.HDFSCredentials$1.run(HDFSCredentials.java:100) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1878) at org.apache.oozie.action.hadoop.HDFSCredentials.obtainTokensForNamenodes(HDFSCredentials.java:99) at org.apache.oozie.action.hadoop.HDFSCredentials.updateCredentials(HDFSCredentials.java:65) at org.apache.oozie.action.hadoop.JavaActionExecutor.setCredentialTokens(JavaActionExecutor.java:1546) at org.apache.oozie.action.hadoop.JavaActionExecutor.submitLauncher(JavaActionExecutor.java:1082) ... 9 more Caused by: java.net.UnknownHostException at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:664) ... 43 more {noformat} > Oozie service permanently caches workflow-supplied FileSystem connectivity > configuration properties for obtaining HDFS Credentials until restarted > -------------------------------------------------------------------------------------------------------------------------------------------------- > > Key: OOZIE-3723 > URL: https://issues.apache.org/jira/browse/OOZIE-3723 > Project: Oozie > Issue Type: Bug > Components: workflow > Reporter: Andrew Olson > Priority: Major > > We recently have encountered two separate issues that both required an Oozie > service restart to resolve. In both situations it was apparent that incorrect > workflow-supplied configuration properties related to remote FileSystem > connectivity to support obtaining HDFS credentials for remote clusters (via > {{{}mapreduce.job.hdfs-servers{}}}) are being retained permanently within > some kind of cache in the Oozie service or underlying Hadoop code. These > cached values are superseding corrected values after the workflow > configuration is fixed, giving us no known way to fix the problem without > restarting the Oozie service. We confirmed that the {{hdfs-site.xml}} and > {{oozie-site.xml}} files where Oozie is running had not been updated since > the prior restart, so not a basic case of stale configuration. > We are running Oozie version 5.2.0 in this environment. > Complete stack traces are provided below. > Issue 1: > A workflow incorrectly set {{dfs.namenode.kerberos.principal.pattern}} to > {{{}{*}*/*{*}@*{*}{*}{}}}, but our system default is {{{}*{}}}. > {noformat} > org.apache.oozie.action.ActionExecutorException: JA009: Couldn't set up IO > streams: java.lang.IllegalArgumentException: Server has invalid Kerberos > principal: nn/hostname.some.domain....@kerberos.realm.com, doesn't match the > pattern: '*/*@*' > at > org.apache.oozie.action.ActionExecutor.convertExceptionHelper(ActionExecutor.java:463) > at > org.apache.oozie.action.ActionExecutor.convertException(ActionExecutor.java:443) > at > org.apache.oozie.action.hadoop.JavaActionExecutor.submitLauncher(JavaActionExecutor.java:1134) > at > org.apache.oozie.action.hadoop.JavaActionExecutor.start(JavaActionExecutor.java:1644) > at > org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:243) > at > org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:68) > at org.apache.oozie.command.XCommand.call(XCommand.java:290) > at > org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:363) > at > org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:292) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > org.apache.oozie.service.CallableQueueService$CallableWrapper.run(CallableQueueService.java:210) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:750) > Caused by: java.io.IOException: Couldn't set up IO streams: > java.lang.IllegalArgumentException: Server has invalid Kerberos principal: > nn/hostname.some.domain....@kerberos.realm.com, doesn't match the pattern: > '*/*@*' > at > org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:894) > at org.apache.hadoop.ipc.Client$Connection.access$3800(Client.java:414) > at org.apache.hadoop.ipc.Client.getConnection(Client.java:1677) > at org.apache.hadoop.ipc.Client.call(Client.java:1502) > at org.apache.hadoop.ipc.Client.call(Client.java:1455) > at > org.apache.hadoop.ipc.ProtobufRpcEngine2$Invoker.invoke(ProtobufRpcEngine2.java:242) > at > org.apache.hadoop.ipc.ProtobufRpcEngine2$Invoker.invoke(ProtobufRpcEngine2.java:129) > at com.sun.proxy.$Proxy35.getDelegationToken(Unknown Source) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getDelegationToken(ClientNamenodeProtocolTranslatorPB.java:1134) > at sun.reflect.GeneratedMethodAccessor31.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422) > at > org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165) > at > org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157) > at > org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359) > at com.sun.proxy.$Proxy36.getDelegationToken(Unknown Source) > at > org.apache.hadoop.hdfs.DFSClient.getDelegationToken(DFSClient.java:734) > at > org.apache.hadoop.hdfs.DistributedFileSystem.getDelegationToken(DistributedFileSystem.java:1996) > at > org.apache.hadoop.security.token.DelegationTokenIssuer.collectDelegationTokens(DelegationTokenIssuer.java:95) > at > org.apache.hadoop.security.token.DelegationTokenIssuer.addDelegationTokens(DelegationTokenIssuer.java:76) > at > org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:143) > at > org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:102) > at > org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodes(TokenCache.java:81) > at > org.apache.oozie.action.hadoop.HDFSCredentials$1.run(HDFSCredentials.java:103) > at > org.apache.oozie.action.hadoop.HDFSCredentials$1.run(HDFSCredentials.java:100) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1878) > at > org.apache.oozie.action.hadoop.HDFSCredentials.obtainTokensForNamenodes(HDFSCredentials.java:99) > at > org.apache.oozie.action.hadoop.HDFSCredentials.updateCredentials(HDFSCredentials.java:65) > at > org.apache.oozie.action.hadoop.JavaActionExecutor.setCredentialTokens(JavaActionExecutor.java:1546) > at > org.apache.oozie.action.hadoop.JavaActionExecutor.submitLauncher(JavaActionExecutor.java:1082) > ... 11 more > Caused by: java.lang.IllegalArgumentException: Server has invalid Kerberos > principal: nn/hostname.some.domain....@kerberos.realm.com, doesn't match the > pattern: '*/*@*' > at > org.apache.hadoop.security.SaslRpcClient.getServerPrincipal(SaslRpcClient.java:319) > at > org.apache.hadoop.security.SaslRpcClient.createSaslClient(SaslRpcClient.java:240) > at > org.apache.hadoop.security.SaslRpcClient.selectSaslClient(SaslRpcClient.java:166) > at > org.apache.hadoop.security.SaslRpcClient.saslConnect(SaslRpcClient.java:392) > at > org.apache.hadoop.ipc.Client$Connection.setupSaslConnection(Client.java:623) > at org.apache.hadoop.ipc.Client$Connection.access$2300(Client.java:414) > at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:843) > at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:839) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1878) > at > org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:839) > ... 44 more > {noformat} > Issue 2: > A workflow incorrectly got FQDNs mixed up, setting > {{dfs.namenode.rpc-address.cluster.nn1}} = > {{hostname.another.domain.com:8020}} instead of > {{{}hostname.some.domain.com:8020{}}}. > {noformat} > org.apache.oozie.action.ActionExecutorException: JA001: Invalid host name: > local host is: "hostname.some.domain.com/10.1.2.3"; destination > host is: "hostname.another.domain.com":8020; > java.net.UnknownHostException; For more details see: > http://wiki.apache.org/hadoop/UnknownHost > at > org.apache.oozie.action.ActionExecutor.convertExceptionHelper(ActionExecutor.java:463) > at > org.apache.oozie.action.ActionExecutor.convertException(ActionExecutor.java:443) > at > org.apache.oozie.action.hadoop.JavaActionExecutor.submitLauncher(JavaActionExecutor.java:1134) > at > org.apache.oozie.action.hadoop.JavaActionExecutor.start(JavaActionExecutor.java:1644) > at > org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:243) > at > org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:68) > at org.apache.oozie.command.XCommand.call(XCommand.java:290) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > org.apache.oozie.service.CallableQueueService$CallableWrapper.run(CallableQueueService.java:210) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:750) > Caused by: java.net.UnknownHostException: Invalid host name: local host is: > "hostname.some.domain.com/10.1.2.3"; destination host is: > "hostname.another.domain.com":8020; java.net.UnknownHostException; > For more details see: http://wiki.apache.org/hadoop/UnknownHost > at sun.reflect.GeneratedConstructorAccessor185.newInstance(Unknown > Source) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:423) > at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:913) > at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:841) > at > org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:662) > at > org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:833) > at org.apache.hadoop.ipc.Client$Connection.access$3800(Client.java:414) > at org.apache.hadoop.ipc.Client.getConnection(Client.java:1677) > at org.apache.hadoop.ipc.Client.call(Client.java:1502) > at org.apache.hadoop.ipc.Client.call(Client.java:1455) > at > org.apache.hadoop.ipc.ProtobufRpcEngine2$Invoker.invoke(ProtobufRpcEngine2.java:242) > at > org.apache.hadoop.ipc.ProtobufRpcEngine2$Invoker.invoke(ProtobufRpcEngine2.java:129) > at com.sun.proxy.$Proxy35.getDelegationToken(Unknown Source) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getDelegationToken(ClientNamenodeProtocolTranslatorPB.java:1134) > at sun.reflect.GeneratedMethodAccessor59.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422) > at > org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165) > at > org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157) > at > org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359) > at com.sun.proxy.$Proxy36.getDelegationToken(Unknown Source) > at > org.apache.hadoop.hdfs.DFSClient.getDelegationToken(DFSClient.java:734) > at > org.apache.hadoop.hdfs.DistributedFileSystem.getDelegationToken(DistributedFileSystem.java:1996) > at > org.apache.hadoop.security.token.DelegationTokenIssuer.collectDelegationTokens(DelegationTokenIssuer.java:95) > at > org.apache.hadoop.security.token.DelegationTokenIssuer.addDelegationTokens(DelegationTokenIssuer.java:76) > at > org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:143) > at > org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:102) > at > org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodes(TokenCache.java:81) > at > org.apache.oozie.action.hadoop.HDFSCredentials$1.run(HDFSCredentials.java:103) > at > org.apache.oozie.action.hadoop.HDFSCredentials$1.run(HDFSCredentials.java:100) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1878) > at > org.apache.oozie.action.hadoop.HDFSCredentials.obtainTokensForNamenodes(HDFSCredentials.java:99) > at > org.apache.oozie.action.hadoop.HDFSCredentials.updateCredentials(HDFSCredentials.java:65) > at > org.apache.oozie.action.hadoop.JavaActionExecutor.setCredentialTokens(JavaActionExecutor.java:1546) > at > org.apache.oozie.action.hadoop.JavaActionExecutor.submitLauncher(JavaActionExecutor.java:1082) > ... 9 more > Caused by: java.net.UnknownHostException > at > org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:664) > ... 43 more > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)