[
https://issues.apache.org/jira/browse/TEZ-3680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Rajesh Balamohan updated TEZ-3680:
----------------------------------
Attachment: TEZ-3680.2.patch
Unordered*Writer was honoring pipelinedshuffle earlier. It is possible that
pipelined shuffle is disabled as the config is common for sorted/unsorted
outputs. However, in the case of unordered*writer,
{{TEZ_RUNTIME_ENABLE_FINAL_MERGE_IN_OUTPUT}} can be considered as well for
disabling final merge. Addressed this in the current patch.
> Optimizations to UnorderedPartitionedKVWriter
> ---------------------------------------------
>
> Key: TEZ-3680
> URL: https://issues.apache.org/jira/browse/TEZ-3680
> Project: Apache Tez
> Issue Type: Improvement
> Reporter: Rajesh Balamohan
> Attachments: profiler.png, TEZ-3680.1.patch, TEZ-3680.2.patch
>
>
> 1. Consider increasing the number of threads in spill executor.
> {{TEZ_RUNTIME_UNORDERED_OUTPUT_MAX_PER_BUFFER_SIZE_BYTES}} can be used to
> configure the buffer size. If smaller buffer sizes are provided, there is a
> chance of getting frequent spills; currently the spill executor operates in
> single threaded mode.
> 2. During profiling, things like incrementing the counters, notifying
> progress came up. This may not be common in regular tez jobs. But in
> processes like LLAP (hive based), it is possible to get into such situations.
> I will attach the profiler snapshot showing this. It would be good to
> update/notify less frequently.
> 3. Optimize mergeAll().
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)