[
https://issues.apache.org/jira/browse/TEZ-3673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15960344#comment-15960344
]
Siddharth Seth commented on TEZ-3673:
-------------------------------------
Don't think a new configuration is required to say "USE 32M buffers". Buffer
size control should already be possible via
tez.runtime.unordered.output.max-per-buffer.size-bytes ?
Instead, I think a Configuration is required which indicates when a spill
should happen. Something like.
<=0 -> Spill each buffer individually
0-100 -> Trigger point as percentage of entire buffer which will cause a spill.
(Wrapped to per-buffer boundaries, ceiled). 75% of 10 buffers would mean spill
after 8 buffers.
With Final merge avoidance, we would spill after each buffer.
Wit Final merge enabled, spill less frequently.
cc [~rajesh.balamohan] - any thoughts on this from a performance standpoint?
> Allocate smaller buffers in UnorderedPartitionedKVWriter
> --------------------------------------------------------
>
> Key: TEZ-3673
> URL: https://issues.apache.org/jira/browse/TEZ-3673
> Project: Apache Tez
> Issue Type: Improvement
> Reporter: Harish Jaiprakash
> Assignee: Harish Jaiprakash
> Attachments: TEZ-3673.01.patch
>
>
> UnorderedPartitionedKVWriter allocates in bigger chunks. It may or may not
> get filled up. In PipelinedSorter, we start off with 32MB chunks. But
> UnorderedPartitionedKVWriter can be worse as it allocates bigger blocks. Need
> to revisit this allocation.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)