[
https://issues.apache.org/jira/browse/PIG-3325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Mark Wagner updated PIG-3325:
-----------------------------
Attachment: PIG-3325.optimize.1.patch
The core issue is that getMemorySize() is O(N) if a new element has been added
since the last call. I've made that case O(1). However, this patch only brings
the call time for adding tuples to ~4500 ns (and job time is still 2x that of
0.10.1).
[~dvryaboy], can you share some of your experience with the issue you saw for
PIG-2293? I have a nice test job for this issue but I don't have any benchmark
really for bag spilling performance, so I'm not sure how big of an issue small
bags were for spilling, or what a good tradeoff between add() speed and spill
speed would be.
> Adding a tuple to a bag is slow
> -------------------------------
>
> Key: PIG-3325
> URL: https://issues.apache.org/jira/browse/PIG-3325
> Project: Pig
> Issue Type: Bug
> Affects Versions: 0.11, 0.11.1, 0.11.2
> Reporter: Mark Wagner
> Assignee: Mark Wagner
> Priority: Critical
> Attachments: PIG-3325.demo.patch, PIG-3325.optimize.1.patch
>
>
> The time it takes to add a tuple to a bag has increased significantly,
> causing some jobs to take about 50x longer compared to 0.10.1. I've tracked
> this down to PIG-2923, which has made adding a tuple heavier weight (it now
> includes some memory estimation).
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira