[ 
https://issues.apache.org/jira/browse/TEZ-1909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14366825#comment-14366825
 ] 

Jeff Zhang commented on TEZ-1909:
---------------------------------

Attach the new patch to address the review comment.
Apart from the issues in the review comments, I also found there's one issue 
about RecoveryService. For the scenario of draining the events before 
RecoverySerivce is stopped, previously I take the event queue's size eqaul to 
zero as an indication of events are all consumed, but it is not true. Because 
even if the event queue is empty, the event may still being processing. I fix 
this bug in the new patch just like AsyncDispatcher did. 

bq. the "if (skipAllOtherEvents) {" check is probably also needed at the top of 
the loop to prevent new files from being opened and read ( in addition to 
short-circuiting the read of all events in the given file ). Maybe just log a 
message that other files were present and skipped
Fix it. also add unit test in TestRecoveryParser

bq. any reason why this is needed in the DAGAppMaster "Set<String> getDagIDs()" 
?
Only for unit test. But in the new patch, I remove it and initialize the Set in 
the setup method.

bq. also, we should add a test for adding corrupt data to the summary stream 
and ensuring that its processing fails
Done.

bq. I do not see TEZ_AM_RECOVERY_HANDLE_REMAINING_EVENT_WHEN_STOPPED being used 
anywhere apart from being set to true in one of the tests.
Fix it.

bq. please replace "import com.sun.tools.javac.util.List;" with java.lang.List
Fix it

bq. testCorruptedLastRecord should also verify that the dag submitted event was 
seen.
Done. verify DAGAppMaster.createDAG is invoked.






> Remove need to copy over all events from attempt 1 to attempt 2 dir
> -------------------------------------------------------------------
>
>                 Key: TEZ-1909
>                 URL: https://issues.apache.org/jira/browse/TEZ-1909
>             Project: Apache Tez
>          Issue Type: Sub-task
>            Reporter: Hitesh Shah
>            Assignee: Jeff Zhang
>         Attachments: TEZ-1909-1.patch, TEZ-1909-2.patch, TEZ-1909-3.patch
>
>
> Use of file versions should prevent the need for copying over data into a 
> second attempt dir. Care needs to be taken to handle "last corrupt record" 
> handling. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to