My guess would be that for hashing, a hash table is pre-allocated based on
the number of keys in the hash.  That would explain why with more keys, the
memory usage grows.  But that's just my guess.  Someone who really
understands how this works should chime in :).

-- Zelaine

On Fri, May 27, 2016 at 10:36 AM, rahul challapalli <
challapallira...@gmail.com> wrote:

> Any inputs on this one?
>
> On Wed, May 25, 2016 at 7:51 PM, rahul challapalli <
> challapallira...@gmail.com> wrote:
>
> > Its using hash aggregation.
> > On May 25, 2016 7:48 PM, "Zelaine Fong" <zf...@maprtech.com> wrote:
> >
> >> What does the explain plan show?  I.e., is the group by being done via a
> >> hash agg or a streaming agg?  If it's a streaming agg, then you still
> have
> >> to sort the entire data set before you reduce it down to a single group.
> >> That would explain the increase in memory as you add group by keys.
> >>
> >> -- Zelaine
> >>
> >> On Wed, May 25, 2016 at 5:50 PM, rahul challapalli <
> >> challapallira...@gmail.com> wrote:
> >>
> >> > I am trying to understand the memory usage patterns for hash
> aggregate.
> >> The
> >> > below query completes in 9.163 seconds and uses 24 MB of memory for
> >> > hash-aggregate (according to profile)
> >> >
> >> > select max(d.l_linenumber) from (select l_linenumber, 'asdf' c1,
> 'kfjhl'
> >> > c2, 'reyui' c3, 'khdfs' c4, 'vkhj' c5  from mem_heavy1) d group by
> d.c1,
> >> > d.c2, d.c3, d.c4, d.c5;
> >> >
> >> > Adding one more constant column to the group by, the below query takes
> >> > 11.638 seconds and uses 29 MB of ram
> >> >
> >> > select max(d.l_linenumber) from (select l_linenumber, 'asdf' c1,
> 'kfjhl'
> >> > c2, 'reyui' c3, 'khdfs' c4, 'vkhj' c5, 'bmkr' c6  from mem_heavy1) d
> >> group
> >> > by d.c1, d.c2, d.c3, d.c4, d.c5, d.c6;
> >> >
> >> > The below query with one more constant column added to group by 14.622
> >> > seconds and uses 33 MB memory
> >> >
> >> > select max(d.l_linenumber) from (select l_linenumber, 'asdf' c1,
> 'kfjhl'
> >> > c2, 'reyui' c3, 'khdfs' c4, 'vkhj' c5, 'bmkr' c6, 'ciuh' c7  from
> >> > mem_heavy1) d group by d.c1, d.c2, d.c3, d.c4, d.c5, d.c6, d.c7;
> >> >
> >> >
> >> > As you can see, there is only one disctinct group in all the above
> >> cases.
> >> > It looks like the memory usage is proportional to no of elements in
> the
> >> > group by clause. Is this expected?
> >> >
> >> > Is the increase in time expected between the above queries? (As we did
> >> not
> >> > introduce any new groups)
> >> >
> >> > - Rahul
> >> >
> >>
> >
>

Reply via email to