With limited memory and what seems to be higher concurrency you may want to reduce the minor fragments (threads) per query per node. See if you can reduce planner.width.max_per_node on the cluster and not have too much impact on the response times.
Slightly smaller (512MB) parquet files may potentially also help, but that is usually harder to restructure the data than system settings. --Andries On 6/29/17, 7:39 AM, "François Méthot" <[email protected]> wrote: Hi, I am investigating issue where we are started getting Out of Heap space error when querying parquet files in Drill 1.10. It is currently set to 8GB heap, and 20GB off -heap. We can't spare more. We usually query 0.7 to 1.2 GB parquet files. recently we have been more on the 1.2GB side. For same number of files. It now fails on simple select bunch of fields.... where ....needle in haystack type of params. Drill is configured with the old reader: store.parquet_use_reader=false because of this bug DRILL-5435 (Limit cause Mem Leak) I have set the max number of large query to 2 instead of 10 temporarly, It did help so far. My question: Could parquet file size be related to those new exceptions? Would reducing max file size help to improve robustness of query in drill (at the expense of having more files to scan)? Thanks Francois
