Github user kkhatua commented on the issue:

    https://github.com/apache/drill/pull/1086
  
    The primary objective of the JIRA (and this PR) is to allow Drill to truly 
start up in a bare-min memory footprint. Most people trying Drill would start 
experimenting with a LocalFS, before trying on a small distributedFS and then 
eventually scaling out the number of nodes for the DFS.
    
    The challenge has been to figure out what is a reasonably small size for 
Drill to function without failing. A setting of *5GB* (`1G Heap`+`3G 
Direct`+`512M CodeCache`) was sufficient for a single node Drill instance to 
startup and run the TPCH benchmark (single user, via SQLLine) for 
*scaleFactor=1*; against +CSV-Text+ and uncompressed +Parquet+ formats. Queries 
that failed were because of HashAgg operator not having sufficient memory. To 
work around this, as the error message suggested, the HashAgg fallback option 
was enabled. The heap memory peaked at `800MB`, while the Drillbit process 
itself maxed out at `2.2GB`.
    Functional tests from the drill-test framework also ran to completion 
without issue.
    
    However, the unit-tests hang because of insufficient memory. The tests 
typically need about `3GB Heap` and `4GB Direct` (~8GB total), which is defined 
separately in the `pom.xml` .
    
    Based on this, we have the following options, if we wish to move towards 
reducing the default memory footprint:
    
    1. Reduce default memory to 5GB, but leave the unit test’s memory 
requirements (8GB at last check) intact.
      We don’t expect anyone to run unit tests, so if they did run unit tests 
with the 8GB setting, it would run to completion if the user has enough memory.
      If a user does provide an excesssively large input to process to Drill, 
it is expected to correctly report insufficient memory availability and there 
is good documentation for explaining how to increase memory.
    
    2. Reduce default memory to match the unit-test memory requirements of 8GB
      This reduces the memory requirement substantially from the 8GB, but is 
still high for someone trying out Drill in a memory-constrained VM (or 
sandboxed) environment. This also means that we need to track the minimum 
memory requirements of the unit-tests and keep in sync with it. Even if the 
user is not intending to run unit-tests in the limited-memory environment.
    
    3. Change nothing, but introduce sample files on lines of 
`$DRILL_HOME/conf/drill-override.conf.example`
      The proposal here would be to have a minimum viable memory footprint 
config file (e.g. `drill-env.sh.minMem` ), which a first time user of Drill can 
swap with `./conf/drill-env.sh` when bringing up Drill. The flip side of this 
is that a Drill trial user would need to have the knowledge of swapping the 
config file for low-memory usage before starting up Drill.
    
    There seems to be consensus that users don’t feel intimidated to install 
Drill and start working with small-to-medium workloads on something as low end 
as a laptop. Based on that, I'd recommend option # 1, which is this PR. I'll 
leave this PR open to discussion.


---

Reply via email to