[
https://issues.apache.org/jira/browse/TEZ-2609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14620033#comment-14620033
]
Bikas Saha commented on TEZ-2609:
---------------------------------
There is probably an existing jira to make the AM send next heartbeat time to
the tasks based on load. Eventually since the AM is the bottleneck, it should
be the place to decide the time interval vs. each task making a local decision.
Thoughts?
However, there is some heuristic in the task that immediately asks the AM for
more events if the AM sends a task max number of events for a single heartbeat.
That is probably premature because the task hasnt even started/finished
processing the events it has just received.
> Consider calling TaskReporter...->..heartbeat() when there are no more events
> to be processed in LogicalIOProcessorRuntimeTask
> ------------------------------------------------------------------------------------------------------------------------------
>
> Key: TEZ-2609
> URL: https://issues.apache.org/jira/browse/TEZ-2609
> Project: Apache Tez
> Issue Type: Bug
> Reporter: Rajesh Balamohan
>
> Default TEZ_TASK_AM_HEARTBEAT_INTERVAL_MS is around 100 ms. This works for
> most of the usecases. However, for large jobs (10000s of tasks), this can be
> a problem and have timeout issues. Setting this to a very large value would
> degrade the job runtime and lower value can cause timeout issues for large
> jobs.
> It might be worth to consider deferring heartbeat() when there are events to
> be processed in LogicalIOProcessorRuntimeTask.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)