[ https://issues.apache.org/jira/browse/TEZ-3437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15507984#comment-15507984 ]
Siddharth Seth commented on TEZ-3437: ------------------------------------- bq. Yes. For 0 inputs how should the progress be reported? 1.0f - always complete. bq. Fixed method signatures to be consistent. Let me know if I am missing something here. Individual inputs don't need to throw the Exception. The question was really around why IOException and InterruptedException. I'd change this to either Exception - similar to the other methods (but not very useful), or InterruptedException + a TezCheckedException or a custom ProgressFailedException. Anyone else has thoughts on this? Processors may want to handle this, hence a checked Exception. bq. My assumption here is that totalBytesRead and totalFileBytes will be used as and when available. As a background thread for progress update, should it matter that the read and writes to progress value is not synchronized? Appreciate some more comments on how we can make this better and what would you do to fix this. They're read and updated in different threads. It's possible for the reading thread to always see a cached copy. Could use Atomic vars to fix this, or progress specific locks. This is mainly related to visibility. bq. Since some processors dont have the conf objects, should we make a more elaborate change for the processors to pick up conf as well(add conf as a member maybe?). Most of the ones which don't have a config are test processors. I don't think they need change. I'm curious about Pig and Hive will handle this though - hardcode to 100ms, or? What does the framework recommend. {code} + if (runtimeTask.getProgress() != progress) { + runtimeTask.setProgress(progress); + notifyProgress(); + } {code} Not sure why this change was required in the original patch? Can we let progress reporting continue as is? This progress is used as a heartbeat for the task timeout monitor. (TezConfiguration#TEZ_TASK_PROGRESS_STUCK_INTERVAL_MS). Earlier, this timeout handling would rely upon the processor explicitly setting progress - which would typically be done after n rows. That's no longer the case. Even if the processor keeps reporting progress - 0.0f for a long time, while it waits for instance, it'll end up timing out. With the check completely removed - we have a thread running which reports progress - and makes the processor 'report after n entries' fairly useless. The extra thread to report progress works nicely for the test processors. For Pig, and Hive - I'd imagine they'd want progress to be reported after a certain number of entries are processed - rather than a heartbeat thread. Any thoughts on reporting progress based on Outputs? > Improve synchronization and the progress report behavior for Inputs from > TEZ-3317 > --------------------------------------------------------------------------------- > > Key: TEZ-3437 > URL: https://issues.apache.org/jira/browse/TEZ-3437 > Project: Apache Tez > Issue Type: Bug > Reporter: Kuhu Shukla > Assignee: Kuhu Shukla > Attachments: TEZ-3437.001.patch, TEZ-3437.002.patch, > TEZ-3437.003.patch > > > Follow up from TEZ-3317 to improve the getProgress thread synchronization and > replace timerTasks with ScheduledExecutorService. -- This message was sent by Atlassian JIRA (v6.3.4#6332)