Ok, so this node is not a gateway. It is part of the cluster, which means
you don¹t need slider-client.xml at all. Just have HADOOP_CONF_DIR
pointing to /etc/hadoop/conf in slider-env.sh and that should be it.

So the above simplifies your config setup. It will not solve either of the
2 problems you are facing.

Now coming to the 2 issues you are facing, you have to provide additional
logs for us to understand better. Let¹s start with  -
1. RM logs (specifically between the time when rm1->rm2 failover is
simulated)
2. Slider App logs

-Gour

On 7/25/16, 5:16 PM, "Manoj Samel" <manojsamelt...@gmail.com> wrote:

>   1. Not clear about your question on "gateway" node. The node running
>   slider is part of the hadoop cluster and there are other services like
>   Oozie that run on this node that utilizes hdfs and yarn. So if your
>   question is whether the node is otherwise working for HDFS and Yarn
>   configuration, it is working
>   2. I copied all files from HADOOP_CONF_DIR (say /etc/hadoop/conf) to
>the
>   directory containing slider-client.xml (say /data/latest/conf)
>   3. In earlier email, I had done a mistake where slider-env.sh file
>HADOOP_CONF_DIR
>   was pointing to original directory /etc/hadoop/conf. I edited it to
>   point to same directory containing slider-client.xml & slider-env.sh
>i.e.
>   /data/latest/conf
>   4. I emptied slider-client.xml. It just had the
><configuration></configuration>.
>   The creation of spas worked but the Slider AM still shows the same
>issue.
>   i.e. when RM1 goes from active to standby, slider AM goes from RUNNING
>to
>   ACCPTED state with same error about TOKEN. Also NOTE that when
>   slider-client.xml is empty, the "slider destroy xxx" command still
>fails
>   with Zookeeper connection errors.
>   5. I then added same parameters (as my last email - except
>   HADOOP_CONF_DIR) to slider-client.xml and ran. This time slider-env.sh
>   has HADOOP_CONF_DIR pointing to /data/latest/conf and slider-client.xml
>   does not have HADOOP_CONF_DIR. The same issue exists (but "slider
>   destroy" does not fails)
>   6. Could you explain what do you expect to pick up from Hadoop
>   configurations that will help you in RM Token ? If slider has token
>from
>   RM1, and it switches to RM2, not clear what slider does to get
>delegation
>   token for RM2 communication ?
>   7. It is worth repeating again that issue happens only when RM1 was
>   active when slider app was created and then RM1 becomes standby. If
>RM2 was
>   active when slider app was created, then slider AM keeps running for
>any
>   number of switches between RM2 and RM1 back and forth ...
>
>
>On Mon, Jul 25, 2016 at 4:21 PM, Gour Saha <gs...@hortonworks.com> wrote:
>
>> The node you are running slider from, is that a gateway node? Sorry for
>> not being explicit. I meant copy everything under /etc/hadoop/conf from
>> your cluster into some temp directory (say /tmp/hadoop_conf) in your
>> gateway node or local or whichever node you are running slider from.
>>Then
>> set HADOOP_CONF_DIR to /tmp/hadoop_conf and clear everything out from
>> slider-client.xml.
>>
>> On 7/25/16, 4:12 PM, "Manoj Samel" <manojsamelt...@gmail.com> wrote:
>>
>> >Hi Gour,
>> >
>> >Thanks for your prompt reply.
>> >
>> >FYI, issue happens when I create slider app when rm1 is active and when
>> >rm1
>> >fails over to rm2. As soon as rm2 becomes active; the slider AM goes
>>from
>> >RUNNING to ACCEPTED state with above error.
>> >
>> >For your suggestion, I did following
>> >
>> >1) Copied core-site, hdfs-site, yarn-site, and mapred-site from
>> >HADOOP_CONF_DIR
>> >to slider conf directory.
>> >2) Our slider-env.sh already had HADOOP_CONF_DIR set
>> >3) I removed all properties from slider-client.xml EXCEPT following
>> >
>> >   - HADOOP_CONF_DIR
>> >   - slider.yarn.queue
>> >   - slider.zookeeper.quorum
>> >   - hadoop.registry.zk.quorum
>> >   - hadoop.registry.zk.root
>> >   - hadoop.security.authorization
>> >   - hadoop.security.authentication
>> >
>> >Then I made rm1 active, installed and created slider app and restarted
>>rm1
>> >(to make rm2) active. The slider-am again went from RUNNING to ACCEPTED
>> >state.
>> >
>> >Let me know if you want me to try further changes.
>> >
>> >If I make the slider-client.xml completely empty per your suggestion,
>>only
>> >slider AM comes up but it
>> >fails to start components. The AM log shows errors trying to connect to
>> >zookeeper like below.
>> >2016-07-25 23:07:41,532
>> >[AmExecutor-006-SendThread(localhost.localdomain:2181)] WARN
>> >zookeeper.ClientCnxn - Session 0x0 for server null, unexpected error,
>> >closing socket connection and attempting reconnect
>> >java.net.ConnectException: Connection refused
>> >
>> >Hence I kept minimal info in slider-client.xml
>> >
>> >FYI This is slider version 0.80
>> >
>> >Thanks,
>> >
>> >Manoj
>> >
>> >On Mon, Jul 25, 2016 at 2:54 PM, Gour Saha <gs...@hortonworks.com>
>>wrote:
>> >
>> >> If possible, can you copy the entire content of the directory
>> >> /etc/hadoop/conf and then set HADOOP_CONF_DIR in slider-env.sh to it.
>> >>Keep
>> >> slider-client.xml empty.
>> >>
>> >> Now when you do the same rm1->rm2 and then the reverse failovers, do
>>you
>> >> see the same behaviors?
>> >>
>> >> -Gour
>> >>
>> >> On 7/25/16, 2:28 PM, "Manoj Samel" <manojsamelt...@gmail.com> wrote:
>> >>
>> >> >Another observation (whatever it is worth)
>> >> >
>> >> >If slider app is created and started when rm2 was active, then it
>> >>seems to
>> >> >survive switches between rm2 and rm1 (and back). I.e
>> >> >
>> >> >* rm2 is active
>> >> >* create and start slider application
>> >> >* fail over to rm1. Now the Slider AM keeps running
>> >> >* fail over to rm2 again. Slider AM still keeps running
>> >> >
>> >> >So, it seems if it starts with rm1 active, then the AM goes to
>> >>"ACCEPTED"
>> >> >state when RM fails to rm2. If it starts with rm2 active, then it
>>runs
>> >> >fine
>> >> >with any switches between rm1 and rm2.
>> >> >
>> >> >Any feedback ?
>> >> >
>> >> >Thanks,
>> >> >
>> >> >Manoj
>> >> >
>> >> >On Mon, Jul 25, 2016 at 12:25 PM, Manoj Samel
>> >><manojsamelt...@gmail.com>
>> >> >wrote:
>> >> >
>> >> >> Setup
>> >> >>
>> >> >> - Hadoop 2.6 with RM HA, Kerberos enabled
>> >> >> - Slider 0.80
>> >> >> - In my slider-client.xml, I have added all RM HA properties,
>> >>including
>> >> >> the ones mentioned in
>>http://markmail.org/message/wnhpp2zn6ixo65e3.
>> >> >>
>> >> >> Following is the issue
>> >> >>
>> >> >> * rm1 is active, rm2 is standby
>> >> >> * deploy and start slider application, it runs fine
>> >> >> * restart rm1, rm2 is now active.
>> >> >> * The slider-am now goes from running into "ACCEPTED" mode. It
>>stays
>> >> >>there
>> >> >> till rm1 is made active again.
>> >> >>
>> >> >> In the slider-am log, it tries to connect to RM2 and connection
>>fails
>> >> >>due
>> >> >> to org.apache.hadoop.security.AccessControlException: Client
>>cannot
>> >> >> authenticate via:[TOKEN]. See detailed log below
>> >> >>
>> >> >>  It seems it has some token (delegation token?) for RM1 but tries
>>to
>> >>use
>> >> >> same(?) for RM2 and fails. Am I missing some configuration ???
>> >> >>
>> >> >> Thanks,
>> >> >>
>> >> >>
>> >> >>
>> >> >> 2016-07-25 19:06:48,088 [AMRM Heartbeater thread] INFO
>> >> >>  client.ConfiguredRMFailoverProxyProvider - Failing over to rm2
>> >> >> 2016-07-25 19:06:48,088 [AMRM Heartbeater thread] WARN
>> >> >>  security.UserGroupInformation - PriviledgedActionException
>> >>as:abc@XYZ
>> >> >> (auth:KERBEROS)
>> >>cause:org.apache.hadoop.security.AccessControlException:
>> >> >> Client cannot authenticate via:[TOKEN]
>> >> >> 2016-07-25 19:06:48,088 [AMRM Heartbeater thread] WARN
>>ipc.Client -
>> >> >> Exception encountered while connecting to the server :
>> >> >> org.apache.hadoop.security.AccessControlException: Client cannot
>> >> >> authenticate via:[TOKEN]
>> >> >> 2016-07-25 19:06:48,088 [AMRM Heartbeater thread] WARN
>> >> >>  security.UserGroupInformation - PriviledgedActionException
>> >>as:abc@XYZ
>> >> >> (auth:KERBEROS) cause:java.io.IOException:
>> >> >> org.apache.hadoop.security.AccessControlException: Client cannot
>> >> >> authenticate via:[TOKEN]
>> >> >> 2016-07-25 19:06:48,089 [AMRM Heartbeater thread] INFO
>> >> >>  retry.RetryInvocationHandler - Exception while invoking allocate
>>of
>> >> >>class
>> >> >> ApplicationMasterProtocolPBClientImpl over rm2 after 287 fail over
>> >> >> attempts. Trying to fail over immediately.
>> >> >> java.io.IOException: Failed on local exception:
>>java.io.IOException:
>> >> >> org.apache.hadoop.security.AccessControlException: Client cannot
>> >> >> authenticate via:[TOKEN]; Host Details : local host is: "<SliderAM
>> >> >> HOST>/<slider AM Host IP>"; destination host is: "<RM2
>>HOST>":23130;
>> >> >>         at
>> >> >>org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:772)
>> >> >>         at org.apache.hadoop.ipc.Client.call(Client.java:1476)
>> >> >>         at org.apache.hadoop.ipc.Client.call(Client.java:1403)
>> >> >>         at
>> >> >>
>> >>
>> 
>>>>>>org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEng
>>>>>>in
>> >>>>e.
>> >> >>java:230)
>> >> >>         at com.sun.proxy.$Proxy23.allocate(Unknown Source)
>> >> >>         at
>> >> >>
>> >>
>> 
>>>>>>org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPB
>>>>>>Cl
>> >>>>ie
>> >> >>ntImpl.allocate(ApplicationMasterProtocolPBClientImpl.java:77)
>> >> >>         at sun.reflect.GeneratedMethodAccessor10.invoke(Unknown
>> >>Source)
>> >> >>         at
>> >> >>
>> >>
>> 
>>>>>>sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces
>>>>>>so
>> >>>>rI
>> >> >>mpl.java:43)
>> >> >>         at java.lang.reflect.Method.invoke(Method.java:497)
>> >> >>         at
>> >> >>
>> >>
>> 
>>>>>>org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryI
>>>>>>nv
>> >>>>oc
>> >> >>ationHandler.java:252)
>> >> >>         at
>> >> >>
>> >>
>> 
>>>>>>org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocat
>>>>>>io
>> >>>>nH
>> >> >>andler.java:104)
>> >> >>         at com.sun.proxy.$Proxy24.allocate(Unknown Source)
>> >> >>         at
>> >> >>
>> >>
>> 
>>>>>>org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl.allocate(AMRMCl
>>>>>>ie
>> >>>>nt
>> >> >>Impl.java:278)
>> >> >>         at
>> >> >>
>> >>
>> 
>>>>>>org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl$Hear
>>>>>>tb
>> >>>>ea
>> >> >>tThread.run(AMRMClientAsyncImpl.java:224)
>> >> >> Caused by: java.io.IOException:
>> >> >> org.apache.hadoop.security.AccessControlException: Client cannot
>> >> >> authenticate via:[TOKEN]
>> >> >>         at
>> >> >>org.apache.hadoop.ipc.Client$Connection$1.run(Client.java:682)
>> >> >>         at java.security.AccessController.doPrivileged(Native
>>Method)
>> >> >>         at javax.security.auth.Subject.doAs(Subject.java:422)
>> >> >>         at
>> >> >>
>> >>
>> 
>>>>>>org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInforma
>>>>>>ti
>> >>>>on
>> >> >>.java:1671)
>> >> >>         at
>> >> >>
>> >>
>> 
>>>>>>org.apache.hadoop.ipc.Client$Connection.handleSaslConnectionFailure(C
>>>>>>li
>> >>>>en
>> >> >>t.java:645)
>> >> >>         at
>> >> >>
>> 
>>>>org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:733)
>> >> >>         at
>> >> >> 
>>org.apache.hadoop.ipc.Client$Connection.access$2800(Client.java:370)
>> >> >>         at
>> >>org.apache.hadoop.ipc.Client.getConnection(Client.java:1525)
>> >> >>         at org.apache.hadoop.ipc.Client.call(Client.java:1442)
>> >> >>         ... 12 more
>> >> >> Caused by: org.apache.hadoop.security.AccessControlException:
>>Client
>> >> >> cannot authenticate via:[TOKEN]
>> >> >>         at
>> >> >>
>> >>
>> 
>>>>>>org.apache.hadoop.security.SaslRpcClient.selectSaslClient(SaslRpcClie
>>>>>>nt
>> >>>>.j
>> >> >>ava:172)
>> >> >>         at
>> >> >>
>> >>
>> 
>>>>>>org.apache.hadoop.security.SaslRpcClient.saslConnect(SaslRpcClient.ja
>>>>>>va
>> >>>>:3
>> >> >>96)
>> >> >>         at
>> >> >>
>> >>
>> 
>>>>>>org.apache.hadoop.ipc.Client$Connection.setupSaslConnection(Client.ja
>>>>>>va
>> >>>>:5
>> >> >>55)
>> >> >>         at
>> >> >> 
>>org.apache.hadoop.ipc.Client$Connection.access$1800(Client.java:370)
>> >> >>         at
>> >> >>org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:725)
>> >> >>         at
>> >> >>org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:721)
>> >> >>         at java.security.AccessController.doPrivileged(Native
>>Method)
>> >> >>         at javax.security.auth.Subject.doAs(Subject.java:422)
>> >> >>         at
>> >> >>
>> >>
>> 
>>>>>>org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInforma
>>>>>>ti
>> >>>>on
>> >> >>.java:1671)
>> >> >>         at
>> >> >>
>> 
>>>>org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:720)
>> >> >>         ... 15 more
>> >> >> 2016-07-25 19:06:48,089 [AMRM Heartbeater thread] INFO
>> >> >>  client.ConfiguredRMFailoverProxyProvider - Failing over to rm1
>> >> >>
>> >>
>> >>
>>
>>

Reply via email to