[
https://issues.apache.org/jira/browse/PIG-2888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dmitriy V. Ryaboy updated PIG-2888:
-----------------------------------
Attachment: partialagg_patch_4.patch
Significant improvements to transitions from raw to processed map. Better mem
utilization estimation. Better logging.
While profiling, also noticed an inordinate amount of time being spent in
Distinct$Initial's bag registration, fixed that.
The task that I cited as taking 57 seconds with this patch earlier? It now
takes 30 seconds. Also saw 40% speed improvement vs older version of this patch
on a production job.
Please review :).
> Improve performance of POPartialAgg
> -----------------------------------
>
> Key: PIG-2888
> URL: https://issues.apache.org/jira/browse/PIG-2888
> Project: Pig
> Issue Type: Improvement
> Reporter: Dmitriy V. Ryaboy
> Assignee: Dmitriy V. Ryaboy
> Attachments: partialagg_patch_1.patch, partialagg_patch_2.patch,
> partialagg_patch_3.patch, partialagg_patch_4.patch
>
>
> During performance testing, we found that POPartialAgg can cause performance
> degradation for Pig jobs when the Algebraic UDFs it's being applied to aren't
> well suited to the operator's assumptions. Changing the implementation to a
> more flexible hash-based model can provide significant performance
> improvements.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira