Hi Based on pull request 1307, the latest test result as below, the performance be improved 3 times.
presto:default> select province,sum(age),count(*) from presto_carbon_dict group by province order by province; province | _col1 | _col2 ----------+----------+--------- AB | 57442740 | 1385010 BC | 57488826 | 1385580 MB | 57564702 | 1386510 NB | 57599520 | 1386960 NL | 57446592 | 1383774 NS | 57448734 | 1384272 NT | 57534228 | 1386936 NU | 57506844 | 1385346 ON | 57484956 | 1384470 PE | 57325164 | 1379802 QC | 57467886 | 1385076 SK | 57385152 | 1382364 YT | 57377556 | 1383900 (13 rows) Query 20170902_033821_00006_h6g24, FINISHED, 1 node Splits: 50 total, 50 done (100.00%) 0:03 [18M rows, 0B] [6.62M rows/s, 0B/s] Regards Liang Liang Chen wrote > Hi > > For -- 4) Lazy decoding of the dictionary, just i tested 180 millions > rows data with the script: > "select province,sum(age),count(*) from presto_carbondata group by > province order by province" > > Spark integration module has "dictionary lazy decode", presto doesn't have > "dictionary lazy decode", the performance is 4.5 times difference, so > "dictionary lazy decode" might much help to improve aggregation > performance. > > The detail test result as below : * > 1. Presto+CarbonData is 9 second: * > presto:default> select province,sum(age),count(*) from presto_carbondata > group by province order by province; > province | _col1 | _col2 > ----------+----------+--------- > AB | 57442740 | 1385010 > BC | 57488826 | 1385580 > MB | 57564702 | 1386510 > NB | 57599520 | 1386960 > NL | 57446592 | 1383774 > NS | 57448734 | 1384272 > NT | 57534228 | 1386936 > NU | 57506844 | 1385346 > ON | 57484956 | 1384470 > PE | 57325164 | 1379802 > QC | 57467886 | 1385076 > SK | 57385152 | 1382364 > YT | 57377556 | 1383900 > (13 rows) > > Query 20170720_022833_00004_c9ky2, FINISHED, 1 node > Splits: 55 total, 55 done (100.00%) > 0:09 [18M rows, 34.3MB] [1.92M rows/s, 3.65MB/s] * > 2.Spark+CarbonData is :2 seconds * > scala> benchmark { carbon.sql("select province,sum(age),count(*) from > presto_carbondata group by province order by province").show } > +--------+--------+--------+ > |province|sum(age)|count(1)| > +--------+--------+--------+ > | AB|57442740| 1385010| > | BC|57488826| 1385580| > | MB|57564702| 1386510| > | NB|57599520| 1386960| > | NL|57446592| 1383774| > | NS|57448734| 1384272| > | NT|57534228| 1386936| > | NU|57506844| 1385346| > | ON|57484956| 1384470| > | PE|57325164| 1379802| > | QC|57467886| 1385076| > | SK|57385152| 1382364| > | YT|57377556| 1383900| > +--------+--------+--------+ > > 2109.346231ms -- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/