[
https://issues.apache.org/jira/browse/TEZ-4103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ahmed Hussein updated TEZ-4103:
-------------------------------
Attachment: TEZ-4103.006.patch
> Progress in DAG, Vertex, and tasks is incorrect
> -----------------------------------------------
>
> Key: TEZ-4103
> URL: https://issues.apache.org/jira/browse/TEZ-4103
> Project: Apache Tez
> Issue Type: Bug
> Reporter: Ahmed Hussein
> Assignee: Ahmed Hussein
> Priority: Major
> Attachments: TEZ-4103.001.patch, TEZ-4103.002.patch,
> TEZ-4103.003.patch, TEZ-4103.004.patch, TEZ-4103.005.patch, TEZ-4103.006.patch
>
>
> Looking at the progress code, there some few issues that could lead to some
> problems calculating the progress.
> There are some cases when the progress never reach 1.0.
> This is a list of issues that need to be fixed in the progress code:
> * After TEZ-3982, since values are skipped in the In some cases, the
> progress of DAG or a vertex may never reach 1.0f. this is in both
> "{{DAGImpl.java}}" and "{{ProgressHelper.java}}"
> * {{ProgressHelper}} schedules a service to update the progress, dubbed
> `{{ProgressHelper.monitorProgress}}`. According to Java Documentation:
> {quote}If any execution of the task encounters an exception,
> subsequent executions are suppressed.
> Otherwise, the task will only terminate via cancellation
> or termination of the executor.
> {quote}
> In other words, if the service dies, there is no way to catch that in the
> code and the progress will never be updated.
> * The `{{SimpleProcessor.inputMap}}` is not thread-safe. They are
> initialized as `{{LinkedHashMap}}` and there is no synchronization on the
> field objects in the map. This could be problematic in concurrent context.
> * `{{VertexImpl.getProgress()}}` does not check the range of the progress
> calculated in `{{VertexImpl.computeProgress()}}`
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)