[ 
https://issues.apache.org/jira/browse/DRILL-6076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16331895#comment-16331895
 ] 

ASF GitHub Bot commented on DRILL-6076:
---------------------------------------

Github user kkhatua commented on the issue:

    https://github.com/apache/drill/pull/1086
  
    The primary objective of the JIRA (and this PR) is to allow Drill to truly 
start up in a bare-min memory footprint. Most people trying Drill would start 
experimenting with a LocalFS, before trying on a small distributedFS and then 
eventually scaling out the number of nodes for the DFS.
    
    The challenge has been to figure out what is a reasonably small size for 
Drill to function without failing. A setting of *5GB* (`1G Heap`+`3G 
Direct`+`512M CodeCache`) was sufficient for a single node Drill instance to 
startup and run the TPCH benchmark (single user, via SQLLine) for 
*scaleFactor=1*; against +CSV-Text+ and uncompressed +Parquet+ formats. Queries 
that failed were because of HashAgg operator not having sufficient memory. To 
work around this, as the error message suggested, the HashAgg fallback option 
was enabled. The heap memory peaked at `800MB`, while the Drillbit process 
itself maxed out at `2.2GB`.
    Functional tests from the drill-test framework also ran to completion 
without issue.
    
    However, the unit-tests hang because of insufficient memory. The tests 
typically need about `3GB Heap` and `4GB Direct` (~8GB total), which is defined 
separately in the `pom.xml` .
    
    Based on this, we have the following options, if we wish to move towards 
reducing the default memory footprint:
    
    1. Reduce default memory to 5GB, but leave the unit test’s memory 
requirements (8GB at last check) intact.
      We don’t expect anyone to run unit tests, so if they did run unit tests 
with the 8GB setting, it would run to completion if the user has enough memory.
      If a user does provide an excesssively large input to process to Drill, 
it is expected to correctly report insufficient memory availability and there 
is good documentation for explaining how to increase memory.
    
    2. Reduce default memory to match the unit-test memory requirements of 8GB
      This reduces the memory requirement substantially from the 8GB, but is 
still high for someone trying out Drill in a memory-constrained VM (or 
sandboxed) environment. This also means that we need to track the minimum 
memory requirements of the unit-tests and keep in sync with it. Even if the 
user is not intending to run unit-tests in the limited-memory environment.
    
    3. Change nothing, but introduce sample files on lines of 
`$DRILL_HOME/conf/drill-override.conf.example`
      The proposal here would be to have a minimum viable memory footprint 
config file (e.g. `drill-env.sh.minMem` ), which a first time user of Drill can 
swap with `./conf/drill-env.sh` when bringing up Drill. The flip side of this 
is that a Drill trial user would need to have the knowledge of swapping the 
config file for low-memory usage before starting up Drill.
    
    There seems to be consensus that users don’t feel intimidated to install 
Drill and start working with small-to-medium workloads on something as low end 
as a laptop. Based on that, I'd recommend option # 1, which is this PR. I'll 
leave this PR open to discussion.


> Reduce the default memory from a total of 13GB to 5GB
> -----------------------------------------------------
>
>                 Key: DRILL-6076
>                 URL: https://issues.apache.org/jira/browse/DRILL-6076
>             Project: Apache Drill
>          Issue Type: Task
>            Reporter: Kunal Khatua
>            Assignee: Kunal Khatua
>            Priority: Critical
>             Fix For: 1.13.0
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> Currently, the default memory requirements for Drill are about 13GB, with the 
> following allocations:
> * 4GB Heap
> * 8GB Direct Memory
> * 1GB CodeCache
> * 512MB MaxPermSize
> Also, with Drill 1.12.0, the recommendation is to move to JDK8, which makes 
> the MaxPermSize as irrelevant.
> With that, the default requirements total to 13GB, which is rather high. This 
> is especially a problem for scenarios where people are trying out Drill and 
> might be using this in a development environment where 13GB is too high.
> When using the public [test 
> framework|https://github.com/mapr/drill-test-framework/] for Apache Drill, it 
> was observed that the framework's functional and unit tests passed 
> successfully with memory as little as 5GB; based on the following allocation:
> * 1GB Heap
> * 3GB Direct Memory
> * 512MB CodeCache
> * 512MB MaxPermSize
> Based on this finding, the proposal is to reduce the defaults from the 
> current settings to the values just mentioned above. The drill-env.sh file 
> already has details in the comments, along with the recommended values that 
> reflect the original 13GB defaults.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to