Rahul, can you send me the query profile separately ? Also, can you try group-by on fixed-width columns instead of Varchar ? With single group, the hash table itself should be consuming relatively small amount of memory.
On Fri, May 27, 2016 at 11:14 AM, Zelaine Fong <zf...@maprtech.com> wrote: > My guess would be that for hashing, a hash table is pre-allocated based on > the number of keys in the hash. That would explain why with more keys, the > memory usage grows. But that's just my guess. Someone who really > understands how this works should chime in :). > > -- Zelaine > > On Fri, May 27, 2016 at 10:36 AM, rahul challapalli < > challapallira...@gmail.com> wrote: > > > Any inputs on this one? > > > > On Wed, May 25, 2016 at 7:51 PM, rahul challapalli < > > challapallira...@gmail.com> wrote: > > > > > Its using hash aggregation. > > > On May 25, 2016 7:48 PM, "Zelaine Fong" <zf...@maprtech.com> wrote: > > > > > >> What does the explain plan show? I.e., is the group by being done > via a > > >> hash agg or a streaming agg? If it's a streaming agg, then you still > > have > > >> to sort the entire data set before you reduce it down to a single > group. > > >> That would explain the increase in memory as you add group by keys. > > >> > > >> -- Zelaine > > >> > > >> On Wed, May 25, 2016 at 5:50 PM, rahul challapalli < > > >> challapallira...@gmail.com> wrote: > > >> > > >> > I am trying to understand the memory usage patterns for hash > > aggregate. > > >> The > > >> > below query completes in 9.163 seconds and uses 24 MB of memory for > > >> > hash-aggregate (according to profile) > > >> > > > >> > select max(d.l_linenumber) from (select l_linenumber, 'asdf' c1, > > 'kfjhl' > > >> > c2, 'reyui' c3, 'khdfs' c4, 'vkhj' c5 from mem_heavy1) d group by > > d.c1, > > >> > d.c2, d.c3, d.c4, d.c5; > > >> > > > >> > Adding one more constant column to the group by, the below query > takes > > >> > 11.638 seconds and uses 29 MB of ram > > >> > > > >> > select max(d.l_linenumber) from (select l_linenumber, 'asdf' c1, > > 'kfjhl' > > >> > c2, 'reyui' c3, 'khdfs' c4, 'vkhj' c5, 'bmkr' c6 from mem_heavy1) d > > >> group > > >> > by d.c1, d.c2, d.c3, d.c4, d.c5, d.c6; > > >> > > > >> > The below query with one more constant column added to group by > 14.622 > > >> > seconds and uses 33 MB memory > > >> > > > >> > select max(d.l_linenumber) from (select l_linenumber, 'asdf' c1, > > 'kfjhl' > > >> > c2, 'reyui' c3, 'khdfs' c4, 'vkhj' c5, 'bmkr' c6, 'ciuh' c7 from > > >> > mem_heavy1) d group by d.c1, d.c2, d.c3, d.c4, d.c5, d.c6, d.c7; > > >> > > > >> > > > >> > As you can see, there is only one disctinct group in all the above > > >> cases. > > >> > It looks like the memory usage is proportional to no of elements in > > the > > >> > group by clause. Is this expected? > > >> > > > >> > Is the increase in time expected between the above queries? (As we > did > > >> not > > >> > introduce any new groups) > > >> > > > >> > - Rahul > > >> > > > >> > > > > > >