I've had some luck disabling multi-phase aggregations on some queries where memory was an issue.
https://drill.apache.org/docs/guidelines-for-optimizing-aggregation/ After I try that, than I typically look at the hash aggregation as you have done: https://drill.apache.org/docs/sort-based-and-hash-based-memory-constrained-operators/ I've had limited success with changing the max_query_memory_per_node and max_width, sometimes it's a weird combination of things that work in there. https://drill.apache.org/docs/troubleshooting/#memory-issues Back to your spill stuff if you disable hash aggregation, do you know if your spill directories are setup? That may be part of the issue, I am not sure what the default spill behavior of Drill is for spill directory setup. On Fri, Mar 11, 2016 at 2:17 PM, François Méthot <[email protected]> wrote: > Hi, > > Using version 1.5, DirectMemory is currently set at 32GB, heap is at > 8GB. We have been trying to perform multiple aggregation in one query (see > below) on 40 Billions+ rows stored on 13 nodes. We are using parquet > format. > > We keep getting OutOfMemoryException: Failure allocating buffer.. > > on a query that looks like this: > > create table hdfs.`test1234` as > ( > select string_field1, > string_field2, > min ( int_field3 ), > max ( int_field4 ), > count(1), > count ( distinct int_field5 ), > count ( distinct int_field6 ), > count ( distinct string_field7 ) > from hdfs.`/data/` > group by string_field1, string_field2; > ) > > The documentation state: > "Currently, hash-based operations do not spill to disk as needed." > > and > > "If the hash-based operators run out of memory during execution, the query > fails. If large hash operations do not fit in memory on your system, you > can disable these operations. When disabled, Drill creates alternative > plans that allow spilling to disk." > > My understanding is that it will fall back to Streaming aggregation, which > required sorting.. > > but > > "As of Drill 1.5, ... the sort operator (in queries that ran successfully > in previous releases) may not have enough memory, resulting in a failed > query" > > And Indeed, disabling hash agg and hash join resulted in memory leak error. > > So it looks like increasing direct memory our only option. > > Is there a plan to have Hash Aggregation to spill on disk in the next > release? > > > Thanks for your feedback >
