[
https://issues.apache.org/jira/browse/TEZ-3680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Rajesh Balamohan updated TEZ-3680:
----------------------------------
Attachment: TEZ-3680.3.patch
Thanks @sseth.
1. Removed finalMergeEnabled changes. Will create separate ticket for that.
2. numThreads: Changed to cachedThreadPool.
3. counter/notifyProgress: Made it to update at every partition level. This
would be at partition level now, which still reduces the number of calls by a
factor of 1009 in cases where partitions are higher. Reason for not going with
the number of records is that, we do not know the size of every record that
would fit in the buffer.
> Optimizations to UnorderedPartitionedKVWriter
> ---------------------------------------------
>
> Key: TEZ-3680
> URL: https://issues.apache.org/jira/browse/TEZ-3680
> Project: Apache Tez
> Issue Type: Improvement
> Reporter: Rajesh Balamohan
> Attachments: profiler.png, TEZ-3680.1.patch, TEZ-3680.2.patch,
> TEZ-3680.3.patch
>
>
> 1. Consider increasing the number of threads in spill executor.
> {{TEZ_RUNTIME_UNORDERED_OUTPUT_MAX_PER_BUFFER_SIZE_BYTES}} can be used to
> configure the buffer size. If smaller buffer sizes are provided, there is a
> chance of getting frequent spills; currently the spill executor operates in
> single threaded mode.
> 2. During profiling, things like incrementing the counters, notifying
> progress came up. This may not be common in regular tez jobs. But in
> processes like LLAP (hive based), it is possible to get into such situations.
> I will attach the profiler snapshot showing this. It would be good to
> update/notify less frequently.
> 3. Optimize mergeAll().
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)