[
https://issues.apache.org/jira/browse/TEZ-1345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14129680#comment-14129680
]
Jeff Zhang commented on TEZ-1345:
---------------------------------
[~hitesh] Attach the new patch
* Remove vertexName in VertexDataMovementEventsGeneratedEvent, using vertexId
for unit test
* bq. any reason for using synchronized as compared to using something like a
LinkedBlockingQueue for the cached events? Does not need to be changed but just
curious as to whether other options were considered?
Using LinkedBlockingQueue may still cause onRootVertexInitialized return
init_events from 2 inputs. After a second thought, I think using
ConcurrentHashMap would be much better. Use ConcurrentHashMap in the new patch.
* bq. Regd. the test in TestDAGRecovery, the test should likely pass even if
the caching fix is not applied. The issue only shows up in cases where there is
a vertex which has an additional input as well as an inbound edge to it from
another vertex. This can be addressed as part of the overall recovery
end-to-end regression tests jira.
The test won't pass when there's only one addition input in the root vertex if
the issue is not fixed. The init event will written after VertexInitedEvent
which would cause the recovery issue.
> Add checks to guarantee all init events are written to recovery to consider
> vertex initialized
> ----------------------------------------------------------------------------------------------
>
> Key: TEZ-1345
> URL: https://issues.apache.org/jira/browse/TEZ-1345
> Project: Apache Tez
> Issue Type: Sub-task
> Reporter: Hitesh Shah
> Assignee: Jeff Zhang
> Attachments: Tez-1345-10.patch, Tez-1345-2.patch, Tez-1345-3.patch,
> Tez-1345-4.patch, Tez-1345-5.patch, Tez-1345-6.patch, Tez-1345-7.patch,
> Tez-1345-8.patch, Tez-1345-9.patch, Tez-1345.patch
>
>
> Related to issue discovered in TEZ-1033
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)