[ 
https://issues.apache.org/jira/browse/TEZ-3206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15261573#comment-15261573
 ] 

Siddharth Seth commented on TEZ-3206:
-------------------------------------

Comments on the patch
- In SpillCallback.onSuccess - 
updateGlobalSizePerPartition(result.wrappedBuffer) is invoked after the 
wrappedBuffer has been reset.
- When a buffer is not being used (e.g. single partition) - I think it'll be 
better to not set the size in a wrappedBuffer instance. (With single partition, 
no buffers - we should ideally not even have created the wrappedBuffer - this 
becomes tougher to fix if the size stats are always set in wrappedBuffer). 
Instead, updateGlobalSizePerPartition could just accept a long array - which 
comes from the buffer or is setup explicitly.
- An additional test for the pipelined case would be useful.

When using this, one thing to note would be the possibility of repetition of 
data from the same task in case of retries.

This ends up with the estimates. I'm not sure how much difference real sizes 
will make in the use case you are targeting, but that could be an option - send 
estimates / send real sizes. The VMEvent could be modified to indicate which 
one is being sent. 


> Have unordered partitioned KV output send partition stats via 
> VertexManagerEvent 
> ---------------------------------------------------------------------------------
>
>                 Key: TEZ-3206
>                 URL: https://issues.apache.org/jira/browse/TEZ-3206
>             Project: Apache Tez
>          Issue Type: New Feature
>            Reporter: Ming Ma
>            Assignee: Ming Ma
>         Attachments: TEZ-3206.patch
>
>
> As part of the auto-parallelism feature, ordered partitioned KV output's 
> partition stats are sent to ShuffleVertexManager via VertexManagerEvent. But 
> this isn't available for unordered partitioned output. Having 
> {{UnorderedPartitionedKVWriter}} send partition stats will enable the 
> auto-parallelism support for unordered KV or other custom data routing 
> mechanisms that depend on partition size.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to