[ 
https://issues.apache.org/jira/browse/HIVE-8956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14225614#comment-14225614
 ] 

Rui Li commented on HIVE-8956:
------------------------------

Thanks [~vanzin] for your input!
Adding {{JobReceived}} and {{JobStarted}} is great. But even with that, we 
still need timeout for {{JobSubmitted}}, because spark job monitor depends on 
it to get the job ID. Otherwise, it's still possible the monitor will hang 
forever.
Besides, do you think we have to set timeouts between all these events?

> Hive hangs while some error/exception happens beyond job execution[Spark 
> Branch]
> --------------------------------------------------------------------------------
>
>                 Key: HIVE-8956
>                 URL: https://issues.apache.org/jira/browse/HIVE-8956
>             Project: Hive
>          Issue Type: Sub-task
>          Components: Spark
>            Reporter: Chengxiang Li
>            Assignee: Rui Li
>              Labels: Spark-M3
>         Attachments: HIVE-8956.1-spark.patch
>
>
> Remote spark client communicate with remote spark context asynchronously, if 
> error/exception is throw out during job execution in remote spark context, it 
> would be wrapped and send back to remote spark client, but if error/exception 
> is throw out beyond job execution, such as job serialized failed, remote 
> spark client would never know what's going on in remote spark context, and it 
> would hangs there.
> Set a timeout in remote spark client side may not a great idea, as we are not 
> sure how long the query executed in spark cluster. we need find a way to 
> check whether job has failed(whole life cycle) in remote spark context.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to