Re: Parquet files size

Andries Engelbrecht Thu, 29 Jun 2017 08:22:50 -0700

With limited memory and what seems to be higher concurrency you may want to 
reduce the minor fragments (threads) per query per node.
See if you can reduce planner.width.max_per_node on the cluster and not have 
too much impact on the response times.


Slightly smaller (512MB) parquet files may potentially also help, but that is 
usually harder to restructure the data than system settings.

--Andries



On 6/29/17, 7:39 AM, "François Méthot" <[email protected]> wrote:

    Hi,
    
      I am investigating issue where we are started getting Out of Heap space
    error when querying parquet files in Drill 1.10. It is currently set to 8GB
    heap, and 20GB off -heap. We can't spare more.
    
    We usually query 0.7 to 1.2 GB parquet files. recently we have been more on
    the 1.2GB side. For same number of files.
    
    It now fails on simple
       select bunch of fields.... where ....needle in haystack type of params.
    
    
    Drill is configured with the old reader:
        store.parquet_use_reader=false
        because of this bug DRILL-5435 (Limit cause Mem Leak)
    
        I have set the max number of large query to 2 instead of 10 temporarly,
    It did help so far.
    
    My question:
    Could parquet file size be related to those new exceptions?
    Would reducing max file size help to improve robustness of query in drill
    (at the expense of having more files to scan)?
    
    Thanks
    Francois

Re: Parquet files size

Reply via email to