Github user chenliang613 commented on the issue:
https://github.com/apache/carbondata/pull/2412
Used the below script to build data:
```
import scala.util.Random
val r = new Random()
val df = spark.sparkContext.parallelize(1 to 1000000000).map(x => ("No." +
r.nextInt(10000), "country" + x % 8, "city" + x % 50, x % 300)).toDF("ID",
"country", "city", "population")
```
Two issues:
1. On presto client, i ran two times as per the below script but get the
different results:
```
presto:default> select country,sum(population) from carbon_table group by
country;
country | _col1
----------+-------------
country4 | 18508531250
country2 | 18758431703
country0 | 18508717865
country7 | 18884021774
country1 | 18633160595
country5 | 18633480022
country6 | 18757895175
country3 | 18883151243
(8 rows)
Query 20180630_041406_00004_crn9q, FINISHED, 1 node
Splits: 65 total, 65 done (100.00%)
1:01 [1000M rows, 8.4GB] [16.5M rows/s, 142MB/s]
presto:default> select country,sum(population) from carbon_table group by
country;
country | _col1
----------+-------------
country4 | 18500014852
country0 | 18499993972
country5 | 18624989449
country1 | 18625008398
country3 | 18874966666
country6 | 18749995166
country7 | 18874992446
country2 | 18749999687
(8 rows)
Query 20180630_041510_00005_crn9q, FINISHED, 1 node
Splits: 65 total, 65 done (100.00%)
0:59 [1000M rows, 8.4GB] [17M rows/s, 146MB/s]
```
2. For aggregation scenarios with 1 billion row data, presto performance is
much lower than spark, as below: (presto is around 1 mins, spark is around 33
seconds)
```
scala> benchmark { carbon.sql("select country,sum(population) from
carbon_table group by country").show}
+--------+---------------+
| country|sum(population)|
+--------+---------------+
|country4| 18499998700|
|country1| 18624998800|
|country3| 18874998800|
|country7| 18874998700|
|country2| 18749998800|
|country6| 18749998700|
|country5| 18624998700|
|country0| 18499998900|
+--------+---------------+
33849.999703ms
```
---