[ https://issues.apache.org/jira/browse/DRILL-5715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Kunal Khatua updated DRILL-5715: -------------------------------- Description: When running the following simple HashAgg-based query on a TPCH-table - Lineitem with 6Billion rows on a 10 node setup (with a single partition to disable any possible spilling to disk) {code:sql} select count(*) from ( select l_quantity , count(l_orderkey) from lineitem group by l_quantity ) {code} the runtime increased from {{7.378 sec}} to {{11.323 sec}} [reported by the JDBC client]. To disable spill-to-disk in Drill-1.11.0, the {{drill-override.conf}} was modified to {code}drill.exec.hashagg.num_partitions : 1{code} Attached are two profiles Drill 1.10.0 : [^2675cc73-9481-16e0-7d21-5f1338611e5f.sys.drill] Drill 1.11.0 : [^2675de42-3789-47b8-29e8-c5077af136db.sys.drill] A separate run was done for both scenarios with the {{planner.width.max_per_node=10}} and profiled with YourKit. Image snippets are attached, indicating the hotspots in both builds: *Drill 1.10.0* : Profile: [^26736242-d084-6604-aac9-927e729da755.sys.drill] CallTree: [^drill-1.10.0_callTree.png] HotSpot: [^drill-1.10.0_hotspot.png] !drill-1.10.0_hotspot.png|drill-1.10.0_hotspot! *Drill 1.11.0* : Profile: [^26736615-9e86-dac9-ad77-b022fd791f67.sys.drill] CallTree: [^drill-1.11.0_callTree.png] HotSpot: [^drill-1.11.0_hotspot.png] !drill-1.11.0_hotspot.png|drill-1.11.0_hotspot! was: When running the following simple HashAgg-based query on a TPCH-table - Lineitem with 6Billion rows on a 10 node setup (with a single partition to disable any possible spilling to disk) {code:sql} select count(*) from ( select l_quantity , count(l_orderkey) from lineitem group by l_quantity ) {code} the runtime increased from {{7.378 sec}} to {{11.323 sec}} [reported by the JDBC client]. To disable spill-to-disk in Drill-1.11.0, the {{drill-override.conf}} was modified to {code}drill.exec.hashagg.num_partitions : 1{code} Attached are two profiles Drill 1.10.0 : [^2675cc73-9481-16e0-7d21-5f1338611e5f.sys.drill] Drill 1.11.0 : [^2675de42-3789-47b8-29e8-c5077af136db.sys.drill] A separate run was done for both scenarios with the {{planner.width.max_per_node=10}} and profiled with YourKit. Image snippets are attached, indicating the hotspots in both builds: Drill 1.10.0 : Profile: [^26736242-d084-6604-aac9-927e729da755.sys.drill] CallTree: [^drill-1.10.0_callTree.png] HotSpot: [^drill-1.10.0_hotspot.png] Drill 1.11.0 : Profile: [^26736615-9e86-dac9-ad77-b022fd791f67.sys.drill] CallTree: [^drill-1.11.0_callTree.png] HotSpot: [^drill-1.11.0_hotspot.png] > Performance of refactored HashAgg operator regressed > ---------------------------------------------------- > > Key: DRILL-5715 > URL: https://issues.apache.org/jira/browse/DRILL-5715 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Codegen > Affects Versions: 1.11.0 > Environment: 10-node RHEL 6.4 (32 Core, 256GB RAM) > Reporter: Kunal Khatua > Assignee: Boaz Ben-Zvi > Labels: performance, regression > Fix For: 1.12.0 > > Attachments: 26736242-d084-6604-aac9-927e729da755.sys.drill, > 26736615-9e86-dac9-ad77-b022fd791f67.sys.drill, > 2675cc73-9481-16e0-7d21-5f1338611e5f.sys.drill, > 2675de42-3789-47b8-29e8-c5077af136db.sys.drill, drill-1.10.0_callTree.png, > drill-1.10.0_hotspot.png, drill-1.11.0_callTree.png, drill-1.11.0_hotspot.png > > > When running the following simple HashAgg-based query on a TPCH-table - > Lineitem with 6Billion rows on a 10 node setup (with a single partition to > disable any possible spilling to disk) > {code:sql} > select count(*) > from ( > select l_quantity > , count(l_orderkey) > from lineitem > group by l_quantity > ) {code} > the runtime increased from {{7.378 sec}} to {{11.323 sec}} [reported by the > JDBC client]. > To disable spill-to-disk in Drill-1.11.0, the {{drill-override.conf}} was > modified to > {code}drill.exec.hashagg.num_partitions : 1{code} > Attached are two profiles > Drill 1.10.0 : [^2675cc73-9481-16e0-7d21-5f1338611e5f.sys.drill] > Drill 1.11.0 : [^2675de42-3789-47b8-29e8-c5077af136db.sys.drill] > A separate run was done for both scenarios with the > {{planner.width.max_per_node=10}} and profiled with YourKit. > Image snippets are attached, indicating the hotspots in both builds: > *Drill 1.10.0* : > Profile: [^26736242-d084-6604-aac9-927e729da755.sys.drill] > CallTree: [^drill-1.10.0_callTree.png] > HotSpot: [^drill-1.10.0_hotspot.png] > !drill-1.10.0_hotspot.png|drill-1.10.0_hotspot! > *Drill 1.11.0* : > Profile: [^26736615-9e86-dac9-ad77-b022fd791f67.sys.drill] > CallTree: [^drill-1.11.0_callTree.png] > HotSpot: [^drill-1.11.0_hotspot.png] > !drill-1.11.0_hotspot.png|drill-1.11.0_hotspot! -- This message was sent by Atlassian JIRA (v6.4.14#64029)