[ 
https://issues.apache.org/jira/browse/TEZ-3673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15960344#comment-15960344
 ] 

Siddharth Seth commented on TEZ-3673:
-------------------------------------

Don't think a new configuration is required to say "USE 32M buffers". Buffer 
size control should already be possible via 
tez.runtime.unordered.output.max-per-buffer.size-bytes ?

Instead, I think a Configuration is required which indicates when a spill 
should happen. Something like.
<=0 -> Spill each buffer individually
0-100 -> Trigger point as percentage of entire buffer which will cause a spill. 
(Wrapped to per-buffer boundaries, ceiled). 75% of 10 buffers would mean spill 
after 8 buffers.

With Final merge avoidance, we would spill after each buffer.
Wit Final merge enabled, spill less frequently.

cc [~rajesh.balamohan] - any thoughts on this from a performance standpoint?

> Allocate smaller buffers in UnorderedPartitionedKVWriter
> --------------------------------------------------------
>
>                 Key: TEZ-3673
>                 URL: https://issues.apache.org/jira/browse/TEZ-3673
>             Project: Apache Tez
>          Issue Type: Improvement
>            Reporter: Harish Jaiprakash
>            Assignee: Harish Jaiprakash
>         Attachments: TEZ-3673.01.patch
>
>
> UnorderedPartitionedKVWriter allocates in bigger chunks. It may or may not 
> get filled up. In PipelinedSorter, we start off with 32MB chunks. But 
> UnorderedPartitionedKVWriter can be worse as it allocates bigger blocks. Need 
> to revisit this allocation.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to