[ 
https://issues.apache.org/jira/browse/SAMZA-376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14097543#comment-14097543
 ] 

Chris Riccomini commented on SAMZA-376:
---------------------------------------

[~zjshen], does this sound accurate to you? It seems possible to me. We make a 
blocking call to Kafka in SamzaAppMaster before we call amClient.start. If the 
Kafka calls take a long time (say, several minutes), would it lead to this 
behavior?

> ApplicationMaster Timeout after LeaderNotAvailableException
> -----------------------------------------------------------
>
>                 Key: SAMZA-376
>                 URL: https://issues.apache.org/jira/browse/SAMZA-376
>             Project: Samza
>          Issue Type: Bug
>    Affects Versions: 0.7.0
>            Reporter: Nicolas Bär
>            Priority: Minor
>
> The application master does not send a heartbeat to the resource manager if 
> the leader of the topic is not available. It will retry until the leader is 
> available and then send the heartbeat. If the Kafka cluster is busy during 
> this time, the leader election might take a moment and the timeout is reached 
> resulting in a shutdown of the application master.
> I hit this issue on our testbed and received a few follow-up error messages 
> after the application master was restarted: 
> {quote}
> ERROR security.UserGroupInformation: PriviledgedActionException as:baer 
> (auth:SIMPLE) 
> cause:org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken):
>  Password not found for ApplicationAttempt 
> appattempt_1407522131931_0001_000001
> {quote}
> I will investigate in this further, but assume it is better placed at the 
> YARN mailing list.
> Here is the relevant part from our discussion on IRC (criccomini):
> {quote}
> SamzaAppMaster
> you'll see:       amClient.start
> and later,       amClient.stop
> the start is starting the YARN AMClient's heartbeat
> now
> SamzaAppMasterTaskManager
> calls assignContainerToSSPTaskNames
> in Util
> which calls Util.getInputStreamPartitions(config)
> and THAT is where Kafka is called
> so basically
> before amClient.start is called
> that getInputStreamPartitiosn method is invoked
> which will block on metadata timeouts
> until it can get the data it needs
> so SamzaAppMaster is constructing SamzaAppMasterTaskManager before it calls 
> amClient.start
> {quote}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to