I find GroupByOperator cache the Aggregation results of different keys.
Please look below cod:
AggregationBuffer[] aggs = null;
    boolean newEntryForHashAggr = false;

    keyProber.hashcode = newKeys.hashCode();
    // use this to probe the hashmap
    keyProber.keys = newKeys;

    // hash-based aggregations
    aggs = hashAggregations.get(keyProber);
    ArrayList<Object> newDefaultKeys = null;
    if (aggs == null) {
      newDefaultKeys = deepCopyElements(keyObjects, keyObjectInspectors,
          ObjectInspectorCopyOption.WRITABLE);
      KeyWrapper newKeyProber = new KeyWrapper(keyProber.hashcode,
          newDefaultKeys, true);
      aggs = newAggregations();
      hashAggregations.put(newKeyProber, aggs);
      newEntryForHashAggr = true;
      numRowsHashTbl++; // new entry in the hash table
    }



When there are 1000000 difference keys, and the value is 10k of each key,
that will occupy 10G memeory, the JVM will out of memeory.  Could anybody
tell me how to handle the question?



Thanks,

LiuLei

Reply via email to