[
https://issues.apache.org/jira/browse/TEZ-2918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14984701#comment-14984701
]
Bikas Saha commented on TEZ-2918:
---------------------------------
I think the sorting, merging and spilling code that use the Progressable
object, inherit the progress notification behavior from MR where they call
progress() after processing a block of records and not on every record. I can
check that again.
So e.g. in collect, on a write the record is just added to the current buffer.
It does not cause any tight loop operation (sort etc.) until it needs to
spill). However if the user code is call write() in a tight loop then yes, this
is a tight loop. Same for read in MRInput.
This is what we were discussing in TEZ-808, was how to make the progress call
cheap enough that it should not matter. However, with cross thread visibility
not guaranteed in Oracle JVM because it does not guarantee that a busy thread
will ever be interrupted, we had to use volatile. I took a cue from the patch
where Gopal changed counter to use atomic long instead of synchronized block
for generic counters and that removed the perf bottleneck. Each read/write
increments a counter and so that should be on the code path already. atomic
vars effectively have a volatile read/write visibility barrier.
In any case, perf can only be measured. While doing all the perf work you
mention, did we create any perf benchmark code or test that can be used to
measure this before and after? Could you please point me to ways to measure
this that have been used earlier.
> Make progress notifications in IOs
> ----------------------------------
>
> Key: TEZ-2918
> URL: https://issues.apache.org/jira/browse/TEZ-2918
> Project: Apache Tez
> Issue Type: Sub-task
> Reporter: Bikas Saha
> Assignee: Bikas Saha
> Attachments: TEZ-2918.1.patch, TEZ-2918.2.patch
>
>
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)