[
https://issues.apache.org/jira/browse/DRILL-4877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Aman Sinha resolved DRILL-4877.
-------------------------------
Resolution: Fixed
Fix Version/s: 1.9.0
Fixed in commit #: 18866d5
> max(dir0), max(dir1) query against parquet data slower by 2X
> ------------------------------------------------------------
>
> Key: DRILL-4877
> URL: https://issues.apache.org/jira/browse/DRILL-4877
> Project: Apache Drill
> Issue Type: Bug
> Components: Query Planning & Optimization
> Affects Versions: 1.9.0
> Environment: 4 node cluster centos
> Reporter: Khurram Faraaz
> Assignee: Aman Sinha
> Priority: Critical
> Fix For: 1.9.0
>
>
> max(dir0), max(dir1) query against parquet data slower by 2X
> test was run with meta data cache on both 1.7.0 and 1.9.0
> there is a difference in query plan and also execution time on 1.9.0 is close
> to 2X that on 1.7.0
> Test from Drill 1.9.0 git commit id: 28d315bb
> on 4 node Centos cluster
> {noformat}
> 0: jdbc:drill:schema=dfs.tmp> select max(dir0), max(dir1), max(dir2) from
> `DRILL_4589`;
> +---------+---------+---------+
> | EXPR$0 | EXPR$1 | EXPR$2 |
> +---------+---------+---------+
> | 2015 | Q4 | null |
> +---------+---------+---------+
> 1 row selected (70.644 seconds)
> {noformat}
> Query plan for the above query, note than in Drill 1.9.0 usedMetadataFile is
> not available is the query plan text.
> {noformat}
> 0: jdbc:drill:schema=dfs.tmp> explain plan for select max(dir0), max(dir1),
> max(dir2) from `DRILL_4589`;
> +------+------+
> | text | json |
> +------+------+
> | 00-00 Screen
> 00-01 Project(EXPR$0=[$0], EXPR$1=[$1], EXPR$2=[$2])
> 00-02 StreamAgg(group=[{}], EXPR$0=[MAX($0)], EXPR$1=[MAX($1)],
> EXPR$2=[MAX($2)])
> 00-03 UnionExchange
> 01-01 StreamAgg(group=[{}], EXPR$0=[MAX($0)], EXPR$1=[MAX($1)],
> EXPR$2=[MAX($2)])
> 01-02 Scan(groupscan=[ParquetGroupScan
> [entries=[ReadEntryWithPath [path=/tmp/DRILL_4589/1990/Q1/f672.parquet],
> ReadEntryWithPath [path=/tmp/DRILL_4589/2011/Q4/f162.parquet],
> ReadEntryWithPath [path=/tmp/DRILL_4589/2000/Q2/f1101.parquet],
> ReadEntryWithPath [path=/tmp/DRILL_4589/1996/Q2/f110.parquet],
> ReadEntryWithPath [path=/tmp/DRILL_4589/2006/Q3/f1192.parquet],
> ReadEntryWithPath [path=/tmp/DRILL_4589/1999/Q2/f174.parquet],
> ReadEntryWithPath [path=/tmp/DRILL_4589/2006/Q4/f885.parquet],
> ReadEntryWithPath [path=/tmp/DRILL_4589/2001/Q3/f1720.parquet],
> ReadEntryWithPath [path=/tmp/DRILL_4589/2001/Q1/f1779.parquet],
> ReadEntryWithPath [path=/tmp/DRILL_4589/1991/Q2/f629.parquet],
> ReadEntryWithPath [path=/tmp/DRILL_4589/2003/Q4/f821.parquet],
> ReadEntryWithPath [path=/tmp/DRILL_4589/2015/Q3/f896.parquet],
> ReadEntryWithPath [path=/tmp/DRILL_4589/2002/Q2/f1458.parquet],
> ReadEntryWithPath [path=/tmp/DRILL_4589/2004/Q4/f1756.parquet],
> ReadEntryWithPath [path=/tmp/DRILL_4589/2001/Q2/f1490.parquet],
> ReadEntryWithPath [path=/tmp/DRILL_4589/2003/Q3/f1137.parquet],
> ReadEntryWithPath [path=/tmp/DRILL_4589/2013/Q1/f561.parquet],
> ReadEntryWithPath [path=/tmp/DRILL_4589/1990/Q3/f1562.parquet],
> ReadEntryWithPath [path=/tmp/DRILL_4589/2003/Q1/f1445.parquet],
> ReadEntryWithPath [path=/tmp/DRILL_4589/2006/Q1/f236.parquet],
> ReadEntryWithPath [path=/tmp/DRILL_4589/1992/Q4/f1209.parquet],
> ReadEntryWithPath [path=/tmp/DRILL_4589/2014/Q2/f518.parquet],
> ReadEntryWithPath [path=/tmp/DRILL_4589/1993/Q4/f1598.parquet],
> ReadEntryWithPath [path=/tmp/DRILL_4589/2008/Q1/f780.parquet],
> ReadEntryWithPath [path=/tmp/DRILL_4589/1999/Q1/f1763.parquet],
> ReadEntryWithPath [path=/tmp/DRILL_4589/1990/Q4/f381.parquet],
> ReadEntryWithPath [path=/tmp/DRILL_4589/1990/Q1/f1870.parquet],
> ReadEntryWithPath [path=/tmp/DRILL_4589/2014/Q1/f915.parquet],
> ReadEntryWithPath [path=/tmp/DRILL_4589/2001/Q2/f673.parquet],
> ReadEntryWithPath [path=/tmp/DRILL_4589/1998/Q1/f736.parquet],
> ReadEntryWithPath [path=/tmp/DRILL_4589/2013/Q2/f749.parquet],
> ReadEntryWithPath [path=/tmp/DRILL_4589/2007/Q3/f111.parquet],
> ReadEntryWithPath [path=/tmp/DRILL_4589/1993/Q3/f776.parquet],
> ReadEntryWithPath [path=/tmp/DRILL_4589/2002/Q1/f403.parquet],
> ReadEntryWithPath [path=/tmp/DRILL_4589/2005/Q2/f904.parquet],
> ReadEntryWithPath [path=/tmp/DRILL_4589/2000/Q4/f944.parquet],
> ReadEntryWithPath [path=/tmp/DRILL_4589/1994/Q2/f506.parquet],
> ReadEntryWithPath [path=/tmp/DRILL_4589/1994/Q4/f612.parquet],
> ReadEntryWithPath [path=/tmp/DRILL_4589/1991/Q1/f1838.parquet],
> ReadEntryWithPath [path=/tmp/DRILL_4589/2012/Q2/f1764.parquet],
> ReadEntryWithPath [path=/tmp/DRILL_4589/2010/Q1/f684.parquet],
> ReadEntryWithPath [path=/tmp/DRILL_4589/2005/Q4/f176.parquet],
> ReadEntryWithPath [path=/tmp/DRILL_4589/1991/Q4/f150.parquet],
> ReadEntryWithPath [path=/tmp/DRILL_4589/2012/Q3/f832.parquet],
> ReadEntryWithPath [path=/tmp/DRILL_4589/1997/Q1/f967.parquet],
> ReadEntryWithPath [path=/tmp/DRILL_4589/2000/Q4/f1733.parquet],
> ReadEntryWithPath [path=/tmp/DRILL_4589/2008/Q2/f383.parquet],
> ReadEntryWithPath [path=/tmp/DRILL_4589/1995/Q2/f1572.parquet],
> ReadEntryWithPath [path=/tmp/DRILL_4589/1991/Q4/f1241.parquet],
> ReadEntryWithPath [path=/tmp/DRILL_4589/1996/Q4/f1111.parquet],
> ReadEntryWithPath [path=/tmp/DRILL_4589/2005/Q2/f1911.parquet],
> ReadEntryWithPath [path=/tmp/DRILL_4589/1998/Q4/f1468.parquet],
> ReadEntryWithPath [path=/tmp/DRILL_4589/2014/Q4/f1122.parquet],
> ReadEntryWithPath [path=/tmp/DRILL_4589/2013/Q2/f1147.parquet],
> ReadEntryWithPath [path=/tmp/DRILL_4589/2015/Q4/f1445.parquet],
> ReadEntryWithPath [path=/tmp/DRILL_4589/2006/Q1/f1649.parquet],
> ReadEntryWithPath [path=/tmp/DRILL_4589/2005/Q1/f1615.parquet],
> ReadEntryWithPath [path=/tmp/DRILL_4589/2008/Q3/f1947.parquet],
> ReadEntryWithPath [path=/tmp/DRILL_4589/2007/Q3/f1913.parquet],
> ReadEntryWithPath [path=/tmp/DRILL_4589/1995/Q3/f1432.parquet],
> ReadEntryWithPath [path=/tmp/DRILL_4589/2015/Q2/f353.parquet],
> ReadEntryWithPath [path=/tmp/DRILL_4589/2000/Q2/f838.parquet],
> ReadEntryWithPath [path=/tmp/DRILL_4589/2011/Q1/f1145.parquet],
> ReadEntryWithPath [path=/tmp/DRILL_4589/2010/Q1/f1111.parquet],
> ReadEntryWithPath [path=/tmp/DRILL_4589/2013/Q3/f1443.parquet],
> ReadEntryWithPath [path=/tmp/DRILL_4589/1997/Q4/f676.parquet],
> ReadEntryWithPath [path=/tmp/DRILL_4589/2011/Q4/f89.parquet],
> ReadEntryWithPath [path=/tmp/DRILL_4589/1994/Q1/f1893.parquet],
> ReadEntryWithPath [path=/tmp/DRILL_4589/2015/Q1/f1168.parquet],
> ReadEntryWithPath [path=/tmp/DRILL_4589/2014/Q1/f1134.parquet],
> ReadEntryWithPath [path=/tmp/DRILL_4589/1991/Q1/f441.parquet],
> ReadEntryWithPath [path=/tmp/DRILL_4589/2004/Q3/f1924.parquet],
> ReadEntryWithPath [path=/tmp/DRILL_4589/1995/Q2/f341.parquet],
> ReadEntryWithPath [path=/tmp/DRILL_4589/2014/Q2/f1430.parquet],
> ReadEntryWithPath [path=/tmp/DRILL_4589/2003/Q3/f969.parquet],
> ReadEntryWithPath [path=/tmp/DRILL_4589/1996/Q1/f1123.parquet],
> ReadEntryWithPath [path=/tmp/DRILL_4589/1997/Q1/f1157.parquet],
> ReadEntryWithPath [path=/tmp/DRILL_4589/1999/Q3/f1455.parquet],
> ReadEntryWithPath [path=/tmp/DRILL_4589/1998/Q3/f1421.parquet],
> ReadEntryWithPath [path=/tmp/DRILL_4589/2007/Q4/f654.parquet],
> ReadEntryWithPath [path=/tmp/DRILL_4589/1999/Q2/f1159.parquet],
> ReadEntryWithPath [path=/tmp/DRILL_4589/2014/Q4/f624.parquet],
> ReadEntryWithPath [path=/tmp/DRILL_4589/2010/Q2/f287.parquet],
> ReadEntryWithPath [path=/tmp/DRILL_4589/1992/Q2/f1583.parquet],
> ReadEntryWithPath [path=/tmp/DRILL_4589/1994/Q4/f1881.parquet],
> ReadEntryWithPath [path=/tmp/DRILL_4589/2007/Q3/f18.parquet],
> ReadEntryWithPath [path=/tmp/DRILL_4589/2000/Q4/f59.parquet],
> ReadEntryWithPath [path=/tmp/DRILL_4589/2004/Q3/f738.parquet],
> ReadEntryWithPath [path=/tmp/DRILL_4589/1996/Q4/f1298.parquet],
> ReadEntryWithPath [path=/tmp/DRILL_4589/1995/Q1/f1740.parquet],
> ReadEntryWithPath [path=/tmp/DRILL_4589/1995/Q4/f1264.parquet],
> ReadEntryWithPath [path=/tmp/DRILL_4589/2002/Q4/f299.parquet],
> ReadEntryWithPath [path=/tmp/DRILL_4589/2005/Q1/f467.parquet],
> ReadEntryWithPath [path=/tmp/DRILL_4589/2013/Q1/f1751.parquet],
> ReadEntryWithPath [path=/tmp/DRILL_4589/1993/Q3/f1262.parquet],
> ReadEntryWithPath [path=/tmp/DRILL_4589/1992/Q1/f1287.parquet],
> ReadEntryWithPath [path=/tmp/DRILL_4589/2007/Q1/f945.parquet],
> ReadEntryWithPath [path=/tmp/DRILL_4589/2012/Q1/f87.parquet],
> ReadEntryWithPath [path=/tmp/DRILL_4589/1994/Q3/f1585.parquet],
> ReadEntryWithPath [path=/tmp/DRILL_4589/1999/Q4/f214.parquet],
> ReadEntryWithPath [path=/tmp/DRILL_4589/1996/Q1/f258.parquet],
> ReadEntryWithPath [path=/tmp/DRILL_4589/2002/Q4/f112.parquet],
> ReadEntryWithPath [path=/tmp/DRILL_4589/1997/Q2/f1742.parquet],
> ReadEntryWithPath [path=/tmp/DRILL_4589/1998/Q2/f1776.parquet],
> ReadEntryWithPath [path=/tmp/DRILL_4589/2005/Q4/f1603.parquet],
> ReadEntryWithPath [path=/tmp/DRILL_4589/2004/Q2/f550.parquet],
> ReadEntryWithPath [path=/tmp/DRILL_4589/2014/Q2/f584.parquet],
> ReadEntryWithPath [path=/tmp/DRILL_4589/2015/Q2/f1753.parquet],
> ReadEntryWithPath [path=/tmp/DRILL_4589/1990/Q3/f712.parquet],
> ReadEntryWithPath [path=/tmp/DRILL_4589/2005/Q3/f507.parquet],
> ReadEntryWithPath [path=/tmp/DRILL_4589/2004/Q1/f698.parquet],
> ReadEntryWithPath [path=/tmp/DRILL_4589/1998/Q4/f445.parquet],
> ReadEntryWithPath [path=/tmp/DRILL_4589/1997/Q3/f422.parquet],
> ReadEntryWithPath [path=/tmp/DRILL_4589/2013/Q4/f204.parquet],
> ReadEntryWithPath [path=/tmp/DRILL_4589/2003/Q4/f1433.parquet],
> ReadEntryWithPath [path=/tmp/DRILL_4589/1991/Q2/f695.parquet],
> ReadEntryWithPath [path=/tmp/DRILL_4589/2000/Q1/f1456.parquet],
> ReadEntryWithPath [path=/tmp/DRILL_4589/2000/Q1/f1745.parquet],
> ReadEntryWithPath [path=/tmp/DRILL_4589/2005/Q3/f573.parquet],
> ReadEntryWithPath [path=/tmp/DRILL_4589/2013/Q4/f855.parquet],
> ReadEntryWithPath [path=/tmp/DRILL_4589/2001/Q2/f1424.parquet],
> ReadEntryWithPath [path=/tmp/DRILL_4589/1996/Q4/f61.parquet],
> ReadEntryWithPath [path=/tmp/DRILL_4589/1990/Q1/f606.parquet],
> ReadEntryWithPath [path=/tmp/DRILL_4589/2010/Q4/f1575.parquet],
> ReadEntryWithPath [path=/tmp/DRILL_4589/2001/Q4/f1116.parquet],
> ReadEntryWithPath [path=/tmp/DRILL_4589/1991/Q3/f1596.parquet],
> ReadEntryWithPath [path=/tmp/DRILL_4589/2004/Q1/f1479.parquet],
> ReadEntryWithPath [path=/tmp/DRILL_4589/2002/Q1/f1411.parquet],
> ReadEntryWithPath [path=/tmp/DRILL_4589/2006/Q1/f591.parquet],
> ReadEntryWithPath [path=/tmp/DRILL_4589/1996/Q2/f20.parquet],
> ReadEntryWithPath [path=/tmp/DRILL_4589/1993/Q4/f379.parquet],
> ReadEntryWithPath [path=/tmp/DRILL_4589/2014/Q2/f873.parquet],
> ReadEntryWithPath [path=/tmp/DRILL_4589/2000/Q2/f84.parquet],
> ReadEntryWithPath [path=/tmp/DRILL_4589/2001/Q1/f57.parquet],
> ReadEntryWithPath [path=/tmp/DRILL_4589/2000/Q1/f361.parquet],
> ReadEntryWithPath [path=/tmp/DRILL_4589/2000/Q3/f1148.parquet],
> ReadEntryWithPath [path=/tmp/DRILL_4589/2013/Q1/f206.parquet],
> ReadEntryWithPath [path=/tmp/DRILL_4589/1995/Q1/f489.parquet],
> ReadEntryWithPath [path=/tmp/DRILL_4589/1992/Q4/f1564.parquet],
> ReadEntryWithPath [path=/tmp/DRILL_4589/2005/Q2/f1447.parquet],
> ReadEntryWithPath [path=/tmp/DRILL_4589/1999/Q4/f280.parquet],
> ReadEntryWithPath [path=/tmp/DRILL_4589/2010/Q4/f327.parquet],
> ReadEntryWithPath [path=/tmp/DRILL_4589/1990/Q3/f1207.parquet],
> ReadEntryWithPath [path=/tmp/DRILL_4589/2003/Q4/f85.parquet],
> ReadEntryWithPath [path=/tmp/DRILL_4589/1994/Q4/f799.parquet],
> ReadEntryWithPath [path=/tmp/DRILL_4589/2008/Q4/f423.parquet],
> ReadEntryWithPath [path=/tmp/DRILL_4589/2003/Q2/f715.parquet],
> ReadEntryWithPath [path=/tmp/DRILL_4589/2004/Q3/f58.parquet],
> ReadEntryWithPath [path=/tmp/DRILL_4589/2011/Q2/f996.parquet],
> ReadEntryWithPath [path=/tmp/DRILL_4589/2012/Q1/f437.parquet],
> ReadEntryWithPath [path=/tmp/DRILL_4589/2011/Q2/f520.parquet],
> ReadEntryWithPath [path=/tmp/DRILL_4589/1993/Q3/f300.parquet], ReadEntryWithP
> |
> +------+------+
> 1 row selected (26.19 seconds)
> {noformat}
> Details of directory structure and size in bytes
> There are 26 sub-directories with each sub directory having four more
> sub-directories Q1,Q2,Q3 and Q4
> Each such directory having 2000 small parquet files.
> {noformat}
> [root@centos-01 ~]# hadoop fs -du /tmp/DRILL_4589
> 178702459 /tmp/DRILL_4589/.drill.parquet_metadata
> 112420427 /tmp/DRILL_4589/1990
> 112433621 /tmp/DRILL_4589/1991
> 112433621 /tmp/DRILL_4589/1992
> 112433621 /tmp/DRILL_4589/1993
> 112433621 /tmp/DRILL_4589/1994
> 112433621 /tmp/DRILL_4589/1995
> 112433621 /tmp/DRILL_4589/1996
> 112433621 /tmp/DRILL_4589/1997
> 112433621 /tmp/DRILL_4589/1998
> 112433621 /tmp/DRILL_4589/1999
> 112433621 /tmp/DRILL_4589/2000
> 112433621 /tmp/DRILL_4589/2001
> 112433621 /tmp/DRILL_4589/2002
> 112433621 /tmp/DRILL_4589/2003
> 112433621 /tmp/DRILL_4589/2004
> 112433621 /tmp/DRILL_4589/2005
> 112433621 /tmp/DRILL_4589/2006
> 112433621 /tmp/DRILL_4589/2007
> 112433621 /tmp/DRILL_4589/2008
> 112433621 /tmp/DRILL_4589/2009
> 112433621 /tmp/DRILL_4589/2010
> 112433621 /tmp/DRILL_4589/2011
> 112433621 /tmp/DRILL_4589/2012
> 112433621 /tmp/DRILL_4589/2013
> 112433621 /tmp/DRILL_4589/2014
> 112433621 /tmp/DRILL_4589/2015
> total size in bytes of DRILL_4589 directory = 3101633561 => 3.1GB
> {noformat}
> Test result from 1.7.0-SNAPSHOT git commit ID: f7197596
> on 4 node Centos cluster
> {noformat}
> 0: jdbc:drill:schema=dfs.tmp> select max(dir0), max(dir1), max(dir2) from
> `DRILL_4589`;
> +---------+---------+---------+
> | EXPR$0 | EXPR$1 | EXPR$2 |
> +---------+---------+---------+
> | 2015 | Q4 | null |
> +---------+---------+---------+
> 1 row selected (38.05 seconds)
> {noformat}
> Query plan for above query from Drill 1.7.0 note that usedMetadataFile=true
> in the query plan
> {noformat}
> 0: jdbc:drill:schema=dfs.tmp> explain plan for select max(dir0), max(dir1),
> max(dir2) from `DRILL_4589`;
> +------+------+
> | text | json |
> +------+------+
> | 00-00 Screen
> 00-01 Project(EXPR$0=[$0], EXPR$1=[$1], EXPR$2=[$2])
> 00-02 StreamAgg(group=[{}], EXPR$0=[MAX($0)], EXPR$1=[MAX($1)],
> EXPR$2=[MAX($2)])
> 00-03 UnionExchange
> 01-01 StreamAgg(group=[{}], EXPR$0=[MAX($0)], EXPR$1=[MAX($1)],
> EXPR$2=[MAX($2)])
> 01-02 Scan(groupscan=[ParquetGroupScan
> [entries=[ReadEntryWithPath [path=maprfs:///tmp/DRILL_4589]],
> selectionRoot=/tmp/DRILL_4589, numFiles=1, usedMetadataFile=true,
> columns=[`dir0`, `dir1`, `dir2`]]])
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)