[
https://issues.apache.org/jira/browse/HIVE-16094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Siddharth Seth updated HIVE-16094:
----------------------------------
Attachment: HIVE-16094.01.patch
The problem was that if an am was picked up by the queueDrainer when it had 0
fragments, it would not be put back. registerFragment would only add a new
entry to the queue if the am was not known.
AMNodeInfo instances were originally meant to be used across multiple queries
belonging to an AM. We could still achieve that by going back to the old model
of reference counting.
However, I think it's cleaner to maintain an AMNodeInfo instance per query
instance. So - the patch changes the key to be the queryIdentifier. An instance
of amNodeInfo is always maintained in the queue. A heartbeat is only sent if
there are pending fragments. It is removed from the queue after query
completion, or if an error is hit.
cc [~prasanth_j] for review.
> queued containers may timeout if they don't get to run for a long time
> ----------------------------------------------------------------------
>
> Key: HIVE-16094
> URL: https://issues.apache.org/jira/browse/HIVE-16094
> Project: Hive
> Issue Type: Bug
> Affects Versions: 2.2.0
> Reporter: Siddharth Seth
> Assignee: Siddharth Seth
> Priority: Critical
> Attachments: HIVE-16094.01.patch
>
>
> I believe this happened after HIVE-15958 - since we end up keeping amNodeInfo
> in knownAppMaters, and that can result in the callable not being scheduled on
> new task registration.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)