[
https://issues.apache.org/jira/browse/DRILL-5528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16043607#comment-16043607
]
Paul Rogers edited comment on DRILL-5528 at 6/8/17 11:37 PM:
-------------------------------------------------------------
Preliminary results suggest that Drill is severely CPU bound for this query.
Running on a Mac, the thread running the query is at 100% of CPU. But, disk I/O
during the read-sort-and-spill phase is very small, only about 40MB/s for
reads, 20 MB/s for writes peak, with many valleys in between -- on an SSD that
can do 700 MB/s. Earlier tests got upwards of 70+ MB/s.
Later, during the merge phase, read performance jumps to 150 MB/s. CPU is still
around 95%.
Total run time was just over an hour for a file that formerly took just a few
minutes on previous test variations.
This suggests that, somehow, something is using far to much CPU to do the work
of the sort. Will have to investigate. Note, the above is on the DRILL-5325
branch, not master.
Time summary:
{code}
Results: 1 records, 2 batches, 3,608,140 ms
00 Screen
Setup: 0 ms - 0%, 0%
Process: 4 ms - 0%, 0%
01 Project
Setup: 5 ms - 0%, 0%
Process: 1 ms - 0%, 0%
02 StreamAgg (STREAMING AGGREGATE)
Setup: 133 ms - 11%, 0%
Process: 4 ms - 0%, 0%
03 Project
Setup: 53 ms - 4%, 0%
Process: 0 ms - 0%, 0%
04 SelectionVectorRemover (SELECTION VECTOR REMOVER)
Setup: 51 ms - 4%, 0%
Process: 501 ms - 0%, 0%
05 Filter
Setup: 194 ms - 17%, 0%
Process: 14,910 ms - 0%, 0%
06 Project
Setup: 41 ms - 3%, 0%
Process: 1,414 ms - 0%, 0%
07 SelectionVectorRemover (SELECTION VECTOR REMOVER)
Setup: 6 ms - 0%, 0%
Process: 33,020 ms - 1%, 1%
08 Sort (EXTERNAL SORT)
Setup: 0 ms - 0%, 0%
Process: 1,996,522 ms - 87%, 87%
09 Project
Setup: 137 ms - 12%, 0%
Process: 24,847 ms - 1%, 1%
10 Project
Setup: 499 ms - 44%, 0%
Process: 45,571 ms - 2%, 2%
11 Scan (TEXT SUB SCAN)
Setup: 0 ms - 0%, 0%
Process: 153,725 ms - 6%, 6%
Total:
Setup: 1,119 ms
Process: 2,270,519 ms
Input batches: 0, spills: 58, merge/spills: 0
{code}
In summary form:
{code}
08 Sort (EXTERNAL SORT)
Process: 1,996,522 ms - 87%, 87%
10 Project
Process: 45,571 ms - 2%, 2%
11 Scan (TEXT SUB SCAN)
Process: 153,725 ms - 6%, 6%
{code}
The sort itself takes 87% of the time. Reading the original 18 GB of data takes
just 6% of the time. The read rate during the scan is about 18 GB / 154 s = 116
MB/s, which is pretty good. So, clearly the delay is in the sort itself
somewhere.
was (Author: paul-rogers):
Preliminary results suggest that Drill is severely CPU bound for this query.
Running on a Mac, the thread running the query is at 100% of CPU. But, disk I/O
during the read-sort-and-spill phase is very small, only about 40MB/s for
reads, 20 MB/s for writes peak, with many valleys in between -- on an SSD that
can do 700 MB/s. Earlier tests got upwards of 70+ MB/s.
Later, during the merge phase, read performance jumps to 150 MB/s. CPU is still
around 95%.
This suggests that, somehow, something is using far to much CPU to do the work
of the sort. Will have to investigate. Note, the above is on the DRILL-5325
branch, not master.
The original goal of the "managed sort" project was just to manage the memory
used by the sort. Fixing performance issues, while important, starts wandering
a bit off of the original goals...
> Sorting 19GB data with 14GB memory in a single fragment takes ~150 minutes
> --------------------------------------------------------------------------
>
> Key: DRILL-5528
> URL: https://issues.apache.org/jira/browse/DRILL-5528
> Project: Apache Drill
> Issue Type: Bug
> Components: Execution - Relational Operators
> Affects Versions: 1.10.0
> Reporter: Rahul Challapalli
> Assignee: Paul Rogers
> Attachments: 26e2183f-2c5d-4044-0e23-4aeb45c40be5.sys.drill,
> drillbit.log, drillbit.out, drill-env.sh
>
>
> Configuration :
> {code}
> git.commit.id.abbrev=1e0a14c
> DRILL_MAX_DIRECT_MEMORY="32G"
> DRILL_MAX_HEAP="4G"
> {code}
> Based on the runtime of the below query, I suspect there is a performance
> bottleneck somewhere
> {code}
> [root@qa-node190 external-sort]# /opt/drill/bin/sqlline -u
> jdbc:drill:zk=10.10.100.190:5181
> apache drill 1.11.0-SNAPSHOT
> "start your sql engine"
> 0: jdbc:drill:zk=10.10.100.190:5181> ALTER SESSION SET
> `exec.sort.disable_managed` = false;
> +-------+-------------------------------------+
> | ok | summary |
> +-------+-------------------------------------+
> | true | exec.sort.disable_managed updated. |
> +-------+-------------------------------------+
> 1 row selected (0.975 seconds)
> 0: jdbc:drill:zk=10.10.100.190:5181> alter session set
> `planner.width.max_per_node` = 1;
> +-------+--------------------------------------+
> | ok | summary |
> +-------+--------------------------------------+
> | true | planner.width.max_per_node updated. |
> +-------+--------------------------------------+
> 1 row selected (0.371 seconds)
> 0: jdbc:drill:zk=10.10.100.190:5181> alter session set
> `planner.disable_exchanges` = true;
> +-------+-------------------------------------+
> | ok | summary |
> +-------+-------------------------------------+
> | true | planner.disable_exchanges updated. |
> +-------+-------------------------------------+
> 1 row selected (0.292 seconds)
> 0: jdbc:drill:zk=10.10.100.190:5181> alter session set
> `planner.memory.max_query_memory_per_node` = 14106127360;
> +-------+----------------------------------------------------+
> | ok | summary |
> +-------+----------------------------------------------------+
> | true | planner.memory.max_query_memory_per_node updated. |
> +-------+----------------------------------------------------+
> 1 row selected (0.316 seconds)
> 0: jdbc:drill:zk=10.10.100.190:5181> select count(*) from (select * from
> dfs.`/drill/testdata/resource-manager/250wide.tbl` order by columns[0])d
> where d.columns[0] = 'ljdfhwuehnoiueyf';
> +---------+
> | EXPR$0 |
> +---------+
> | 0 |
> +---------+
> 1 row selected (8530.719 seconds)
> {code}
> I attached the logs and profile files. The data is too large to attach to a
> jira. Reach out to me if you need any more information
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)