> On Oct. 29, 2014, 7:17 p.m., Daniel Dai wrote:
> > http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POPartialAgg.java,
> >  line 624
> > <https://reviews.apache.org/r/27169/diff/1/?file=732909#file732909line624>
> >
> >     We shall not wait here until spill done, this will hold gc hook and 
> > does not seems do any good. Can we just do a synchronous aggregation and 
> > quit the hook?

Holding gc notification hook for long should not be a problem. It was the 
previous behaviour as well with spilling of very big bags taking a very long 
time. Do not want to do aggregation in the spill thread as we have lot of 
thread locals(jobconf, udfcontext etc) and those might not be set in the thread 
used by gc hook. With that realized should not be doing sizereduction in 
spill(). Will post a new updated patch.


> On Oct. 29, 2014, 7:17 p.m., Daniel Dai wrote:
> > http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/TestPOPartialAgg.java,
> >  line 319
> > <https://reviews.apache.org/r/27169/diff/1/?file=732911#file732911line319>
> >
> >     The test fail, my first result is (2,(-1)), and second result is 
> > (1,(120))

Fixed it. There is no order guarantee while iterating over hashmap.


- Rohini


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/27169/#review58901
-----------------------------------------------------------


On Nov. 2, 2014, 12:56 p.m., Rohini Palaniswamy wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/27169/
> -----------------------------------------------------------
> 
> (Updated Nov. 2, 2014, 12:56 p.m.)
> 
> 
> Review request for pig.
> 
> 
> Bugs: PIG-3979
>     https://issues.apache.org/jira/browse/PIG-3979
> 
> 
> Repository: pig
> 
> 
> Description
> -------
> 
> Fixed a couple of issues with POPartialAgg
>     - Made the spill of POPartialAgg synchronous so that System.gc() in 
> SpillableMemoryManager actually frees up memory.
>     - Avoid lot of redundant aggregateSecondLevel() calls
>     - Fixed the SpillableMemoryManager to not invoke extraGC if POPartialAgg
>     - Made variables transient which are not required to be serialized in the 
> plan
> 
> 
> Diffs
> -----
> 
>   
> http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POPartialAgg.java
>  1635881 
>   
> http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/impl/util/GroupingSpillable.java
>  PRE-CREATION 
>   
> http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/impl/util/SpillableMemoryManager.java
>  1635881 
>   
> http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/TestPOPartialAgg.java
>  1635881 
> 
> Diff: https://reviews.apache.org/r/27169/diff/
> 
> 
> Testing
> -------
> 
> Unit tests added to TestPOPartialAgg. Ran couple of manual e2e tests to check 
> behaviour.
> 
> 
> Thanks,
> 
> Rohini Palaniswamy
> 
>

Reply via email to