[
https://issues.apache.org/jira/browse/DRILL-5715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Kunal Khatua updated DRILL-5715:
--------------------------------
Description:
When running the following simple HashAgg-based query on a TPCH-table -
Lineitem with 6Billion rows on a 10 node setup (with a single partition to
disable any possible spilling to disk)
{code:sql}
select count(*)
from (
select l_quantity
, count(l_orderkey)
from lineitem
group by l_quantity
) {code}
the runtime increased from {{7.378 sec}} to {{11.323 sec}} [reported by the
JDBC client].
To disable spill-to-disk in Drill-1.11.0, the {{drill-override.conf}} was
modified to
{code}drill.exec.hashagg.num_partitions : 1{code}
Attached are two profiles
Drill 1.10.0 : [^2675cc73-9481-16e0-7d21-5f1338611e5f.sys.drill]
Drill 1.11.0 : [^2675de42-3789-47b8-29e8-c5077af136db.sys.drill]
A separate run was done for both scenarios with the
{{planner.width.max_per_node=10}} and profiled with YourKit.
Image snippets are attached, indicating the hotspots in both builds:
Drill 1.10.0 :
Profile: [^26736242-d084-6604-aac9-927e729da755.sys.drill]
HotSpot: drill-1.10.0_hotspot.jpg
Drill 1.11.0 :
Profile: [^26736615-9e86-dac9-ad77-b022fd791f67.sys.drill]
HotSpot: [^drill-1.11.0_hotspot.jpg]
was:
When running the following simple HashAgg-based query on a TPCH-table -
Lineitem with 6Billion rows on a 10 node setup (with a single partition to
disable any possible spilling to disk)
{code:sql}
select count(*)
from (
select l_quantity
, count(l_orderkey)
from lineitem
group by l_quantity
) {code}
the runtime increased from {{7.378 sec}} to {{11.323 sec}} [reported by the
JDBC client].
To disable spill-to-disk in Drill-1.11.0, the {{drill-override.conf}} was
modified to
{code}drill.exec.hashagg.num_partitions : 1{code}
Attached are two profiles
Drill 1.10.0 : 2675cc73-9481-16e0-7d21-5f1338611e5f.sys.drill
Drill 1.11.0 : 2675de42-3789-47b8-29e8-c5077af136db.sys.drill
A separate run was done for both scenarios with the
{{planner.width.max_per_node=10}} and profiled with YourKit.
Image snippets are attached, indicating the hotspots in both builds:
Drill 1.10.0 :
Profile: 26736242-d084-6604-aac9-927e729da755.sys.drill
HotSpot: drill-1.10.0_hotspot.jpg
Drill 1.11.0 :
Profile: 26736615-9e86-dac9-ad77-b022fd791f67.sys.drill
HotSpot: drill-1.11.0_hotspot.jpg
> Performance of refactored HashAgg operator regressed
> ----------------------------------------------------
>
> Key: DRILL-5715
> URL: https://issues.apache.org/jira/browse/DRILL-5715
> Project: Apache Drill
> Issue Type: Bug
> Components: Execution - Codegen
> Affects Versions: 1.11.0
> Environment: 10-node RHEL 6.4 (32 Core, 256GB RAM)
> Reporter: Kunal Khatua
> Assignee: Boaz Ben-Zvi
> Labels: performance, regression
> Fix For: 1.12.0
>
> Attachments: 26736242-d084-6604-aac9-927e729da755.sys.drill,
> 26736615-9e86-dac9-ad77-b022fd791f67.sys.drill,
> 2675cc73-9481-16e0-7d21-5f1338611e5f.sys.drill,
> 2675de42-3789-47b8-29e8-c5077af136db.sys.drill, drill-1.10.0_callTree.png,
> drill-1.10.0_hotspot.png, drill-1.11.0_callTree.png, drill-1.11.0_hotspot.png
>
>
> When running the following simple HashAgg-based query on a TPCH-table -
> Lineitem with 6Billion rows on a 10 node setup (with a single partition to
> disable any possible spilling to disk)
> {code:sql}
> select count(*)
> from (
> select l_quantity
> , count(l_orderkey)
> from lineitem
> group by l_quantity
> ) {code}
> the runtime increased from {{7.378 sec}} to {{11.323 sec}} [reported by the
> JDBC client].
> To disable spill-to-disk in Drill-1.11.0, the {{drill-override.conf}} was
> modified to
> {code}drill.exec.hashagg.num_partitions : 1{code}
> Attached are two profiles
> Drill 1.10.0 : [^2675cc73-9481-16e0-7d21-5f1338611e5f.sys.drill]
> Drill 1.11.0 : [^2675de42-3789-47b8-29e8-c5077af136db.sys.drill]
> A separate run was done for both scenarios with the
> {{planner.width.max_per_node=10}} and profiled with YourKit.
> Image snippets are attached, indicating the hotspots in both builds:
> Drill 1.10.0 :
> Profile: [^26736242-d084-6604-aac9-927e729da755.sys.drill]
> HotSpot: drill-1.10.0_hotspot.jpg
> Drill 1.11.0 :
> Profile: [^26736615-9e86-dac9-ad77-b022fd791f67.sys.drill]
> HotSpot: [^drill-1.11.0_hotspot.jpg]
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)