[ 
https://issues.apache.org/jira/browse/DRILL-5478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16168204#comment-16168204
 ] 

Paul Rogers commented on DRILL-5478:
------------------------------------

The spill file size is not meant to be a user-tunable parameter. Instead, it is 
externalized only so we can tweak it, if needed, to resolve specific support 
situations.

The file size will be honored only if sufficient memory exists to hold 
sufficient batches. Suppose you make file size 1 GB. Then, you need to have 
more than 1 GB available to the sort to hold all data in memory prior to 
spilling. (Why in memory? Spill files are sorted, so all data that goes into 
the file must be buffered and sorted prior to spilling.)

> Spill file size parameter is not honored by the managed external sort
> ---------------------------------------------------------------------
>
>                 Key: DRILL-5478
>                 URL: https://issues.apache.org/jira/browse/DRILL-5478
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Execution - Relational Operators
>    Affects Versions: 1.10.0
>            Reporter: Rahul Challapalli
>            Assignee: Paul Rogers
>             Fix For: 1.12.0
>
>
> git.commit.id.abbrev=1e0a14c
> Query:
> {code}
> ALTER SESSION SET `exec.sort.disable_managed` = false;
> alter session set `planner.width.max_per_node` = 1;
> alter session set `planner.disable_exchanges` = true;
> alter session set `planner.width.max_per_query` = 1;
> alter session set `planner.memory.max_query_memory_per_node` = 1052428800;
> alter session set `planner.enable_decimal_data_type` = true;
> select count(*) from (
>   select * from dfs.`/drill/testdata/resource-manager/all_types_large` d1
>   order by d1.map.missing
> ) d;
> {code}
> Boot Options (spill file size is set to 256MB)
> {code}
> 0: jdbc:drill:zk=10.10.100.190:5181> select * from sys.boot where name like 
> '%spill%';
> +--------------------------------------------------+---------+-------+---------+----------+----------------------------------------------------+-----------+------------+
> |                       name                       |  kind   | type  | status 
>  | num_val  |                     string_val                     | bool_val  
> | float_val  |
> +--------------------------------------------------+---------+-------+---------+----------+----------------------------------------------------+-----------+------------+
> | drill.exec.sort.external.spill.directories       | STRING  | BOOT  | BOOT   
>  | null     | [
>     # drill-override.conf: 26
>     "/tmp/test"
> ]  | null      | null       |
> | drill.exec.sort.external.spill.file_size         | STRING  | BOOT  | BOOT   
>  | null     | "256M"                                             | null      
> | null       |
> | drill.exec.sort.external.spill.fs                | STRING  | BOOT  | BOOT   
>  | null     | "maprfs:///"                                       | null      
> | null       |
> | drill.exec.sort.external.spill.group.size        | LONG    | BOOT  | BOOT   
>  | 40000    | null                                               | null      
> | null       |
> | drill.exec.sort.external.spill.merge_batch_size  | STRING  | BOOT  | BOOT   
>  | null     | "16M"                                              | null      
> | null       |
> | drill.exec.sort.external.spill.spill_batch_size  | STRING  | BOOT  | BOOT   
>  | null     | "8M"                                               | null      
> | null       |
> | drill.exec.sort.external.spill.threshold         | LONG    | BOOT  | BOOT   
>  | 40000    | null                                               | null      
> | null       |
> +--------------------------------------------------+---------+-------+---------+----------+----------------------------------------------------+-----------+------------+
> {code}
> Below are the spill files while the query is still executing. The size of the 
> spill files is ~34MB
> {code}
> -rwxr-xr-x   3 root root   34957815 2017-05-05 11:26 
> /tmp/test/26f33c36-4235-3531-aeaa-2c73dc4ddeb5_major0_minor0_op5_sort/run1
> -rwxr-xr-x   3 root root   34957815 2017-05-05 11:27 
> /tmp/test/26f33c36-4235-3531-aeaa-2c73dc4ddeb5_major0_minor0_op5_sort/run2
> -rwxr-xr-x   3 root root          0 2017-05-05 11:27 
> /tmp/test/26f33c36-4235-3531-aeaa-2c73dc4ddeb5_major0_minor0_op5_sort/run3
> {code}
> The data set is too large to attach here. Reach out to me if you need anything



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to