Ouch, I was getting tons of exceptions after turning on map-side aggregation:
java.lang.OutOfMemoryError: Java heap space at java.lang.StringCoding$StringEncoder.encode(StringCoding.java:232) at java.lang.StringCoding.encode(StringCoding.java:272) at java.lang.String.getBytes(String.java:947) at org.apache.hadoop.hive.serde2.thrift.TBinarySortableProtocol.writeString(TBinarySortableProtocol.java:299) at org.apache.hadoop.hive.serde2.dynamic_type.DynamicSerDeTypeString.serialize(DynamicSerDeTypeString.java:65) at org.apache.hadoop.hive.serde2.dynamic_type.DynamicSerDeFieldList.serialize(DynamicSerDeFieldList.java:249) at org.apache.hadoop.hive.serde2.dynamic_type.DynamicSerDeStructBase.serialize(DynamicSerDeStructBase.java:81) at org.apache.hadoop.hive.serde2.dynamic_type.DynamicSerDe.serialize(DynamicSerDe.java:174) at org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.process(ReduceSinkOperator.java:153) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:306) at org.apache.hadoop.hive.ql.exec.GroupByOperator.forward(GroupByOperator.java:564) at org.apache.hadoop.hive.ql.exec.GroupByOperator.close(GroupByOperator.java:582) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:263) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:263) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:263) at org.apache.hadoop.hive.ql.exec.ExecMapper.close(ExecMapper.java:96) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332) at org.apache.hadoop.mapred.Child.main(Child.java:155) java.lang.OutOfMemoryError: GC overhead limit exceeded at org.apache.hadoop.hive.ql.exec.GroupByOperator.forward(GroupByOperator.java:552) at org.apache.hadoop.hive.ql.exec.GroupByOperator.close(GroupByOperator.java:582) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:263) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:263) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:263) at org.apache.hadoop.hive.ql.exec.ExecMapper.close(ExecMapper.java:96) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332) at org.apache.hadoop.mapred.Child.main(Child.java:155) java.io.IOException: Task process exit with nonzero status of 1. ... Just to confirm is this just a bug or is by design ? On Fri, Feb 27, 2009 at 10:02 AM, Namit Jain <[email protected]> wrote: > Yes, it flushes the data when the hash table is occupying too much memory > > > > > > *From:* Qing Yan [mailto:[email protected]] > *Sent:* Thursday, February 26, 2009 5:58 PM > *To:* [email protected] > *Subject:* Re: Combine() optimization > > > > Got it. > > > > Does map side aggregation has any special requirement about the dataset? > E.g. The number of unqiue group by keys could be too big to hold > in memory. Will it still work? > > On Fri, Feb 27, 2009 at 5:50 AM, Zheng Shao <[email protected]> wrote: > > Hi Qing, > > We did think about Combiner when we started Hive. However earlier > discussions lead us to believe that hash-based aggregation inside the mapper > will be as competitive as using combiner in most cases. > > In order to enable map-side aggregation, we just need to do the following > before running the hive query: > set hive.map.aggr=true; > > Zheng > > > > On Thu, Feb 26, 2009 at 6:03 AM, Raghu Murthy <[email protected]> wrote: > > Right now Hive does not exploit the combiner. But hash-based map-side > aggregation in hive (controlled by hints) provides a similar optimization. > Using the combiner in addition to map-side aggregation should improve the > performance even more if the combiner can further aggregate the partial > aggregates generated from the mapper. > > > > On 2/26/09 5:57 AM, "Qing Yan" <[email protected]> wrote: > > > Is there any way/plan for Hive to take advantage of M/R's combine() > > phrase? There can be either rules embedded in in the query optimizer or > hints > > passed by user... > > GROUP BY should benefit from this alot.. > > > > Any comment? > > > > > > > > > > -- > Yours, > Zheng > > >
