> On Oct. 29, 2014, 7:17 p.m., Daniel Dai wrote: > > http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POPartialAgg.java, > > line 624 > > <https://reviews.apache.org/r/27169/diff/1/?file=732909#file732909line624> > > > > We shall not wait here until spill done, this will hold gc hook and > > does not seems do any good. Can we just do a synchronous aggregation and > > quit the hook?
Holding gc notification hook for long should not be a problem. It was the previous behaviour as well with spilling of very big bags taking a very long time. Do not want to do aggregation in the spill thread as we have lot of thread locals(jobconf, udfcontext etc) and those might not be set in the thread used by gc hook. With that realized should not be doing sizereduction in spill(). Will post a new updated patch. > On Oct. 29, 2014, 7:17 p.m., Daniel Dai wrote: > > http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/TestPOPartialAgg.java, > > line 319 > > <https://reviews.apache.org/r/27169/diff/1/?file=732911#file732911line319> > > > > The test fail, my first result is (2,(-1)), and second result is > > (1,(120)) Fixed it. There is no order guarantee while iterating over hashmap. - Rohini ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/27169/#review58901 ----------------------------------------------------------- On Nov. 2, 2014, 12:56 p.m., Rohini Palaniswamy wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/27169/ > ----------------------------------------------------------- > > (Updated Nov. 2, 2014, 12:56 p.m.) > > > Review request for pig. > > > Bugs: PIG-3979 > https://issues.apache.org/jira/browse/PIG-3979 > > > Repository: pig > > > Description > ------- > > Fixed a couple of issues with POPartialAgg > - Made the spill of POPartialAgg synchronous so that System.gc() in > SpillableMemoryManager actually frees up memory. > - Avoid lot of redundant aggregateSecondLevel() calls > - Fixed the SpillableMemoryManager to not invoke extraGC if POPartialAgg > - Made variables transient which are not required to be serialized in the > plan > > > Diffs > ----- > > > http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POPartialAgg.java > 1635881 > > http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/impl/util/GroupingSpillable.java > PRE-CREATION > > http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/impl/util/SpillableMemoryManager.java > 1635881 > > http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/TestPOPartialAgg.java > 1635881 > > Diff: https://reviews.apache.org/r/27169/diff/ > > > Testing > ------- > > Unit tests added to TestPOPartialAgg. Ran couple of manual e2e tests to check > behaviour. > > > Thanks, > > Rohini Palaniswamy > >
