[ 
https://issues.apache.org/jira/browse/TEZ-3680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15960208#comment-15960208
 ] 

Siddharth Seth commented on TEZ-3680:
-------------------------------------

On using finalMergeEnabled for PipelinedShuffle - With the change, it serves 
the same purpose as PipelinedShuffle. Think it is better to handle this 
properly in a separate jira, where EnbaleFinalMerge=false + pipelined=false 
means "Send N events at the end", instead of sending them when they are 
generated.

numThreads: Assuming you are trying this out with multiple buffers. Instead of 
a FixedThreadPool - this could be a CachedPool, for cases where multiple 
buffers are not created / the spill from a single thread is fast enough.

counter and notifyProgress - It may be a little too infrequent now. Maybe this 
should be per N records instead of per buffer. Also the notify per partition 
(not in the tight loop), and the notify in the final Merge, likely needs to be 
more often. Can't have a disk write cause a timeout, or the final merge cause a 
timeout.

> Optimizations to UnorderedPartitionedKVWriter
> ---------------------------------------------
>
>                 Key: TEZ-3680
>                 URL: https://issues.apache.org/jira/browse/TEZ-3680
>             Project: Apache Tez
>          Issue Type: Improvement
>            Reporter: Rajesh Balamohan
>         Attachments: profiler.png, TEZ-3680.1.patch, TEZ-3680.2.patch
>
>
> 1. Consider increasing the number of threads in spill executor. 
> {{TEZ_RUNTIME_UNORDERED_OUTPUT_MAX_PER_BUFFER_SIZE_BYTES}} can be used to 
> configure the buffer size. If smaller buffer sizes are provided, there is a 
> chance of getting frequent spills; currently the spill executor operates in 
> single threaded mode.
> 2. During profiling, things like incrementing the counters, notifying 
> progress came up. This may not be common in regular tez jobs. But in 
> processes like LLAP (hive based), it is possible to get into such situations. 
> I will attach the profiler snapshot showing this. It would be good to 
> update/notify less frequently.
> 3. Optimize mergeAll().



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to