[
https://issues.apache.org/jira/browse/DRILL-6076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16331895#comment-16331895
]
ASF GitHub Bot commented on DRILL-6076:
---------------------------------------
Github user kkhatua commented on the issue:
https://github.com/apache/drill/pull/1086
The primary objective of the JIRA (and this PR) is to allow Drill to truly
start up in a bare-min memory footprint. Most people trying Drill would start
experimenting with a LocalFS, before trying on a small distributedFS and then
eventually scaling out the number of nodes for the DFS.
The challenge has been to figure out what is a reasonably small size for
Drill to function without failing. A setting of *5GB* (`1G Heap`+`3G
Direct`+`512M CodeCache`) was sufficient for a single node Drill instance to
startup and run the TPCH benchmark (single user, via SQLLine) for
*scaleFactor=1*; against +CSV-Text+ and uncompressed +Parquet+ formats. Queries
that failed were because of HashAgg operator not having sufficient memory. To
work around this, as the error message suggested, the HashAgg fallback option
was enabled. The heap memory peaked at `800MB`, while the Drillbit process
itself maxed out at `2.2GB`.
Functional tests from the drill-test framework also ran to completion
without issue.
However, the unit-tests hang because of insufficient memory. The tests
typically need about `3GB Heap` and `4GB Direct` (~8GB total), which is defined
separately in the `pom.xml` .
Based on this, we have the following options, if we wish to move towards
reducing the default memory footprint:
1. Reduce default memory to 5GB, but leave the unit test’s memory
requirements (8GB at last check) intact.
We don’t expect anyone to run unit tests, so if they did run unit tests
with the 8GB setting, it would run to completion if the user has enough memory.
If a user does provide an excesssively large input to process to Drill,
it is expected to correctly report insufficient memory availability and there
is good documentation for explaining how to increase memory.
2. Reduce default memory to match the unit-test memory requirements of 8GB
This reduces the memory requirement substantially from the 8GB, but is
still high for someone trying out Drill in a memory-constrained VM (or
sandboxed) environment. This also means that we need to track the minimum
memory requirements of the unit-tests and keep in sync with it. Even if the
user is not intending to run unit-tests in the limited-memory environment.
3. Change nothing, but introduce sample files on lines of
`$DRILL_HOME/conf/drill-override.conf.example`
The proposal here would be to have a minimum viable memory footprint
config file (e.g. `drill-env.sh.minMem` ), which a first time user of Drill can
swap with `./conf/drill-env.sh` when bringing up Drill. The flip side of this
is that a Drill trial user would need to have the knowledge of swapping the
config file for low-memory usage before starting up Drill.
There seems to be consensus that users don’t feel intimidated to install
Drill and start working with small-to-medium workloads on something as low end
as a laptop. Based on that, I'd recommend option # 1, which is this PR. I'll
leave this PR open to discussion.
> Reduce the default memory from a total of 13GB to 5GB
> -----------------------------------------------------
>
> Key: DRILL-6076
> URL: https://issues.apache.org/jira/browse/DRILL-6076
> Project: Apache Drill
> Issue Type: Task
> Reporter: Kunal Khatua
> Assignee: Kunal Khatua
> Priority: Critical
> Fix For: 1.13.0
>
> Original Estimate: 24h
> Remaining Estimate: 24h
>
> Currently, the default memory requirements for Drill are about 13GB, with the
> following allocations:
> * 4GB Heap
> * 8GB Direct Memory
> * 1GB CodeCache
> * 512MB MaxPermSize
> Also, with Drill 1.12.0, the recommendation is to move to JDK8, which makes
> the MaxPermSize as irrelevant.
> With that, the default requirements total to 13GB, which is rather high. This
> is especially a problem for scenarios where people are trying out Drill and
> might be using this in a development environment where 13GB is too high.
> When using the public [test
> framework|https://github.com/mapr/drill-test-framework/] for Apache Drill, it
> was observed that the framework's functional and unit tests passed
> successfully with memory as little as 5GB; based on the following allocation:
> * 1GB Heap
> * 3GB Direct Memory
> * 512MB CodeCache
> * 512MB MaxPermSize
> Based on this finding, the proposal is to reduce the defaults from the
> current settings to the values just mentioned above. The drill-env.sh file
> already has details in the comments, along with the recommended values that
> reflect the original 13GB defaults.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)