Github user kkhatua commented on the issue:
https://github.com/apache/drill/pull/1086
The primary objective of the JIRA (and this PR) is to allow Drill to truly
start up in a bare-min memory footprint. Most people trying Drill would start
experimenting with a LocalFS, before trying on a small distributedFS and then
eventually scaling out the number of nodes for the DFS.
The challenge has been to figure out what is a reasonably small size for
Drill to function without failing. A setting of *5GB* (`1G Heap`+`3G
Direct`+`512M CodeCache`) was sufficient for a single node Drill instance to
startup and run the TPCH benchmark (single user, via SQLLine) for
*scaleFactor=1*; against +CSV-Text+ and uncompressed +Parquet+ formats. Queries
that failed were because of HashAgg operator not having sufficient memory. To
work around this, as the error message suggested, the HashAgg fallback option
was enabled. The heap memory peaked at `800MB`, while the Drillbit process
itself maxed out at `2.2GB`.
Functional tests from the drill-test framework also ran to completion
without issue.
However, the unit-tests hang because of insufficient memory. The tests
typically need about `3GB Heap` and `4GB Direct` (~8GB total), which is defined
separately in the `pom.xml` .
Based on this, we have the following options, if we wish to move towards
reducing the default memory footprint:
1. Reduce default memory to 5GB, but leave the unit testâs memory
requirements (8GB at last check) intact.
We donât expect anyone to run unit tests, so if they did run unit tests
with the 8GB setting, it would run to completion if the user has enough memory.
If a user does provide an excesssively large input to process to Drill,
it is expected to correctly report insufficient memory availability and there
is good documentation for explaining how to increase memory.
2. Reduce default memory to match the unit-test memory requirements of 8GB
This reduces the memory requirement substantially from the 8GB, but is
still high for someone trying out Drill in a memory-constrained VM (or
sandboxed) environment. This also means that we need to track the minimum
memory requirements of the unit-tests and keep in sync with it. Even if the
user is not intending to run unit-tests in the limited-memory environment.
3. Change nothing, but introduce sample files on lines of
`$DRILL_HOME/conf/drill-override.conf.example`
The proposal here would be to have a minimum viable memory footprint
config file (e.g. `drill-env.sh.minMem` ), which a first time user of Drill can
swap with `./conf/drill-env.sh` when bringing up Drill. The flip side of this
is that a Drill trial user would need to have the knowledge of swapping the
config file for low-memory usage before starting up Drill.
There seems to be consensus that users donât feel intimidated to install
Drill and start working with small-to-medium workloads on something as low end
as a laptop. Based on that, I'd recommend option # 1, which is this PR. I'll
leave this PR open to discussion.
---