[jira] [Comment Edited] (DRILL-5528) Sorting 19GB data with 14GB memory in a single fragment takes ~150 minutes

Paul Rogers (JIRA) Thu, 08 Jun 2017 17:04:18 -0700

    [ 
https://issues.apache.org/jira/browse/DRILL-5528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16043607#comment-16043607
 ]


Paul Rogers edited comment on DRILL-5528 at 6/9/17 12:02 AM:
-------------------------------------------------------------

Preliminary results suggest that Drill is severely CPU bound for this query. As 
a baseline, ran the query with the original, "unmanaged", sort.

Running on a Mac, with 3 GB given to the sort, the thread running the query is 
at 100% of CPU. But, disk I/O during the read-sort-and-spill phase is very 
small, only about 40MB/s for reads, 20 MB/s for writes peak, with many valleys 
in between -- on an SSD that can do 700 MB/s. Earlier tests got upwards of 70+ 
MB/s.

Later, during the merge phase, read performance jumps to 150 MB/s. CPU is still 
around 95%.

Total run time was just over an hour for a file that formerly took just a few 
minutes on previous test variations.

This suggests that, somehow, something is using far to much CPU to do the work 
of the sort. This also suggests that the excessive CPU use occurs whether the 
sort memory is "managed" or not.

Time summary:

{code}
Results: 1 records, 2 batches, 3,608,140 ms

00 Screen
  Setup:        0 ms -   0%,   0%
  Process:      4 ms -   0%,   0%
01 Project
  Setup:        5 ms -   0%,   0%
  Process:      1 ms -   0%,   0%
02 StreamAgg (STREAMING AGGREGATE)
  Setup:      133 ms -  11%,   0%
  Process:      4 ms -   0%,   0%
03 Project
  Setup:       53 ms -   4%,   0%
  Process:      0 ms -   0%,   0%
04 SelectionVectorRemover (SELECTION VECTOR REMOVER)
  Setup:       51 ms -   4%,   0%
  Process:    501 ms -   0%,   0%
05 Filter
  Setup:      194 ms -  17%,   0%
  Process: 14,910 ms -   0%,   0%
06 Project
  Setup:       41 ms -   3%,   0%
  Process:  1,414 ms -   0%,   0%
07 SelectionVectorRemover (SELECTION VECTOR REMOVER)
  Setup:        6 ms -   0%,   0%
  Process: 33,020 ms -   1%,   1%
08 Sort (EXTERNAL SORT)
  Setup:        0 ms -   0%,   0%
  Process: 1,996,522 ms -  87%,  87%
09 Project
  Setup:      137 ms -  12%,   0%
  Process: 24,847 ms -   1%,   1%
10 Project
  Setup:      499 ms -  44%,   0%
  Process: 45,571 ms -   2%,   2%
11 Scan (TEXT SUB SCAN)
  Setup:        0 ms -   0%,   0%
  Process: 153,725 ms -   6%,   6%
Total:
  Setup:    1,119 ms
  Process: 2,270,519 ms
Input batches: 0, spills: 58, merge/spills: 0
{code}

In summary form:

{code}
08 Sort (EXTERNAL SORT)
  Process: 1,996,522 ms -  87%,  87%
10 Project
  Process: 45,571 ms -   2%,   2%
11 Scan (TEXT SUB SCAN)
  Process: 153,725 ms -   6%,   6%
{code}

The sort itself takes 87% of the time. Reading the original 18 GB of data takes 
just 6% of the time. The read rate during the scan is about 18 GB / 154 s = 116 
MB/s, which is pretty good. So, clearly the delay is in the sort itself 
somewhere.

For a baseline, ran a query that simply reads and discards data:

{code}
select * FROM (select * from dfs.data.`250wide.tbl`)d where d.columns[0] = 
'ljdfhwuehnoiueyf'
{code}

Ran in just 178 seconds. This confirms that the scan part of the query is fine. 
Read rate starts at 200 MB/s, then drops to 100 MB/s. CPU is at 97% suggesting 
that the query, even for a simple read, is CPU bound rather than disk bound. 
Scan time is 139 seconds or about 130 MB/s read rate.


was (Author: paul-rogers):
Preliminary results suggest that Drill is severely CPU bound for this query. As 
a baseline, ran the query with the original, "unmanaged", sort.

Running on a Mac, the thread running the query is at 100% of CPU. But, disk I/O 
during the read-sort-and-spill phase is very small, only about 40MB/s for 
reads, 20 MB/s for writes peak, with many valleys in between -- on an SSD that 
can do 700 MB/s. Earlier tests got upwards of 70+ MB/s.

Later, during the merge phase, read performance jumps to 150 MB/s. CPU is still 
around 95%.

Total run time was just over an hour for a file that formerly took just a few 
minutes on previous test variations.

This suggests that, somehow, something is using far to much CPU to do the work 
of the sort. This also suggests that the excessive CPU use occurs whether the 
sort memory is "managed" or not.

Time summary:

{code}
Results: 1 records, 2 batches, 3,608,140 ms

00 Screen
  Setup:        0 ms -   0%,   0%
  Process:      4 ms -   0%,   0%
01 Project
  Setup:        5 ms -   0%,   0%
  Process:      1 ms -   0%,   0%
02 StreamAgg (STREAMING AGGREGATE)
  Setup:      133 ms -  11%,   0%
  Process:      4 ms -   0%,   0%
03 Project
  Setup:       53 ms -   4%,   0%
  Process:      0 ms -   0%,   0%
04 SelectionVectorRemover (SELECTION VECTOR REMOVER)
  Setup:       51 ms -   4%,   0%
  Process:    501 ms -   0%,   0%
05 Filter
  Setup:      194 ms -  17%,   0%
  Process: 14,910 ms -   0%,   0%
06 Project
  Setup:       41 ms -   3%,   0%
  Process:  1,414 ms -   0%,   0%
07 SelectionVectorRemover (SELECTION VECTOR REMOVER)
  Setup:        6 ms -   0%,   0%
  Process: 33,020 ms -   1%,   1%
08 Sort (EXTERNAL SORT)
  Setup:        0 ms -   0%,   0%
  Process: 1,996,522 ms -  87%,  87%
09 Project
  Setup:      137 ms -  12%,   0%
  Process: 24,847 ms -   1%,   1%
10 Project
  Setup:      499 ms -  44%,   0%
  Process: 45,571 ms -   2%,   2%
11 Scan (TEXT SUB SCAN)
  Setup:        0 ms -   0%,   0%
  Process: 153,725 ms -   6%,   6%
Total:
  Setup:    1,119 ms
  Process: 2,270,519 ms
Input batches: 0, spills: 58, merge/spills: 0
{code}

In summary form:

{code}
08 Sort (EXTERNAL SORT)
  Process: 1,996,522 ms -  87%,  87%
10 Project
  Process: 45,571 ms -   2%,   2%
11 Scan (TEXT SUB SCAN)
  Process: 153,725 ms -   6%,   6%
{code}

The sort itself takes 87% of the time. Reading the original 18 GB of data takes 
just 6% of the time. The read rate during the scan is about 18 GB / 154 s = 116 
MB/s, which is pretty good. So, clearly the delay is in the sort itself 
somewhere.

For a baseline, ran a query that simply reads and discards data:

{code}
select * FROM (select * from dfs.data.`250wide.tbl`)d where d.columns[0] = 
'ljdfhwuehnoiueyf'
{code}

Ran in just 178 seconds. This confirms that the scan part of the query is fine. 
Read rate starts at 200 MB/s, then drops to 100 MB/s. CPU is at 97% suggesting 
that the query, even for a simple read, is CPU bound rather than disk bound. 
Scan time is 139 seconds or about 130 MB/s read rate.

> Sorting 19GB data with 14GB memory in a single fragment takes ~150 minutes
> --------------------------------------------------------------------------
>
>                 Key: DRILL-5528
>                 URL: https://issues.apache.org/jira/browse/DRILL-5528
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Execution - Relational Operators
>    Affects Versions: 1.10.0
>            Reporter: Rahul Challapalli
>            Assignee: Paul Rogers
>         Attachments: 26e2183f-2c5d-4044-0e23-4aeb45c40be5.sys.drill, 
> drillbit.log, drillbit.out, drill-env.sh
>
>
> Configuration :
> {code}
> git.commit.id.abbrev=1e0a14c
> DRILL_MAX_DIRECT_MEMORY="32G"
> DRILL_MAX_HEAP="4G"
> {code}
> Based on the runtime of the below query, I suspect there is a performance 
> bottleneck somewhere
> {code}
> [root@qa-node190 external-sort]# /opt/drill/bin/sqlline -u 
> jdbc:drill:zk=10.10.100.190:5181
> apache drill 1.11.0-SNAPSHOT
> "start your sql engine"
> 0: jdbc:drill:zk=10.10.100.190:5181> ALTER SESSION SET 
> `exec.sort.disable_managed` = false;
> +-------+-------------------------------------+
> |  ok   |               summary               |
> +-------+-------------------------------------+
> | true  | exec.sort.disable_managed updated.  |
> +-------+-------------------------------------+
> 1 row selected (0.975 seconds)
> 0: jdbc:drill:zk=10.10.100.190:5181> alter session set 
> `planner.width.max_per_node` = 1;
> +-------+--------------------------------------+
> |  ok   |               summary                |
> +-------+--------------------------------------+
> | true  | planner.width.max_per_node updated.  |
> +-------+--------------------------------------+
> 1 row selected (0.371 seconds)
> 0: jdbc:drill:zk=10.10.100.190:5181> alter session set 
> `planner.disable_exchanges` = true;
> +-------+-------------------------------------+
> |  ok   |               summary               |
> +-------+-------------------------------------+
> | true  | planner.disable_exchanges updated.  |
> +-------+-------------------------------------+
> 1 row selected (0.292 seconds)
> 0: jdbc:drill:zk=10.10.100.190:5181> alter session set 
> `planner.memory.max_query_memory_per_node` = 14106127360;
> +-------+----------------------------------------------------+
> |  ok   |                      summary                       |
> +-------+----------------------------------------------------+
> | true  | planner.memory.max_query_memory_per_node updated.  |
> +-------+----------------------------------------------------+
> 1 row selected (0.316 seconds)
> 0: jdbc:drill:zk=10.10.100.190:5181> select count(*) from (select * from 
> dfs.`/drill/testdata/resource-manager/250wide.tbl` order by columns[0])d 
> where d.columns[0] = 'ljdfhwuehnoiueyf';
> +---------+
> | EXPR$0  |
> +---------+
> | 0       |
> +---------+
> 1 row selected (8530.719 seconds)
> {code}
> I attached the logs and profile files. The data is too large to attach to a 
> jira. Reach out to me if you need any more information



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Comment Edited] (DRILL-5528) Sorting 19GB data with 14GB memory in a single fragment takes ~150 minutes

Reply via email to