[ 
https://issues.apache.org/jira/browse/PIG-3325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13657750#comment-13657750
 ] 

Aniket Mokashi edited comment on PIG-3325 at 10/20/13 4:30 AM:
---------------------------------------------------------------

The core issue is that getMemorySize() is O(N) if a new element has been added 
since the last call. I've made that case O(1). However, this patch only brings 
the call time for adding tuples to ~4500 ns (and job time is still 2x that of 
0.10.1).

[~dvryaboy], can you share some of your experience with the issue you saw for 
PIG-2923? I have a nice test job for this issue but I don't have any benchmark 
really for bag spilling performance, so I'm not sure how big of an issue small 
bags were for spilling, or what a good tradeoff between add() speed and spill 
speed would be.



was (Author: mwagner):
The core issue is that getMemorySize() is O(N) if a new element has been added 
since the last call. I've made that case O(1). However, this patch only brings 
the call time for adding tuples to ~4500 ns (and job time is still 2x that of 
0.10.1).

[~dvryaboy], can you share some of your experience with the issue you saw for 
PIG-2293? I have a nice test job for this issue but I don't have any benchmark 
really for bag spilling performance, so I'm not sure how big of an issue small 
bags were for spilling, or what a good tradeoff between add() speed and spill 
speed would be.


> Adding a tuple to a bag is slow
> -------------------------------
>
>                 Key: PIG-3325
>                 URL: https://issues.apache.org/jira/browse/PIG-3325
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.11, 0.12.0, 0.11.1, 0.11.2
>            Reporter: Mark Wagner
>            Assignee: Dmitriy V. Ryaboy
>            Priority: Critical
>             Fix For: 0.12.1
>
>         Attachments: PIG-3325.2.patch, PIG-3325.3.patch, PIG-3325.demo.patch, 
> PIG-3325.optimize.1.patch
>
>
> The time it takes to add a tuple to a bag has increased significantly, 
> causing some jobs to take about 50x longer compared to 0.10.1. I've tracked 
> this down to PIG-2923, which has made adding a tuple heavier weight (it now 
> includes some memory estimation).



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to