[jira] [Commented] (TEZ-3437) Improve synchronization and the progress report behavior for Inputs from TEZ-3317

Siddharth Seth (JIRA) Tue, 20 Sep 2016 15:25:53 -0700

    [ 
https://issues.apache.org/jira/browse/TEZ-3437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15507984#comment-15507984
 ]


Siddharth Seth commented on TEZ-3437:
-------------------------------------

bq. Yes. For 0 inputs how should the progress be reported?
1.0f - always complete.

bq. Fixed method signatures to be consistent. Let me know if I am missing 
something here.
Individual inputs don't need to throw the Exception. The question was really 
around why IOException and InterruptedException. I'd change this to either 
Exception - similar to the other methods (but not very useful), or 
InterruptedException + a TezCheckedException or a custom 
ProgressFailedException. Anyone else has thoughts on this? Processors may want 
to handle this, hence a checked Exception.

bq. My assumption here is that totalBytesRead and totalFileBytes will be used 
as and when available. As a background thread for progress update, should it 
matter that the read and writes to progress value is not synchronized? 
Appreciate some more comments on how we can make this better and what would you 
do to fix this.
They're read and updated in different threads. It's possible for the reading 
thread to always see a cached copy. Could use Atomic vars to fix this, or 
progress specific locks. This is mainly related to visibility.

bq. Since some processors dont have the conf objects, should we make a more 
elaborate change for the processors to pick up conf as well(add conf as a 
member maybe?).
Most of the ones which don't have a config are test processors. I don't think 
they need change. I'm curious about Pig and Hive will handle this though - 
hardcode to 100ms, or? What does the framework recommend.

{code}
+    if (runtimeTask.getProgress() != progress) {
+      runtimeTask.setProgress(progress);
+      notifyProgress();
+    }
{code}
Not sure why this change was required in the original patch? Can we let 
progress reporting continue as is? This progress is used as a heartbeat for the 
task timeout monitor. (TezConfiguration#TEZ_TASK_PROGRESS_STUCK_INTERVAL_MS).
Earlier, this timeout handling would rely upon the processor explicitly setting 
progress - which would typically be done after n rows. That's no longer the 
case. Even if the processor keeps reporting progress - 0.0f for a long time, 
while it waits for instance, it'll end up timing out.
With the check completely removed - we have a thread running which reports 
progress - and makes the processor 'report after n entries' fairly useless.

The extra thread to report progress works nicely for the test processors. For 
Pig, and Hive - I'd imagine they'd want progress to be reported after a certain 
number of entries are processed - rather than a heartbeat thread.

Any thoughts on reporting progress based on Outputs?







> Improve synchronization and the progress report behavior for Inputs from 
> TEZ-3317
> ---------------------------------------------------------------------------------
>
>                 Key: TEZ-3437
>                 URL: https://issues.apache.org/jira/browse/TEZ-3437
>             Project: Apache Tez
>          Issue Type: Bug
>            Reporter: Kuhu Shukla
>            Assignee: Kuhu Shukla
>         Attachments: TEZ-3437.001.patch, TEZ-3437.002.patch, 
> TEZ-3437.003.patch
>
>
> Follow up from TEZ-3317 to improve the getProgress thread synchronization and 
> replace timerTasks with ScheduledExecutorService. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TEZ-3437) Improve synchronization and the progress report behavior for Inputs from TEZ-3317

Reply via email to