FelixYBW commented on issue #11534:
URL:
https://github.com/apache/incubator-gluten/issues/11534#issuecomment-3832590268
here are some tests. We need to set block.size to 500M to get the same
rowgroup number as spark.
sql is part of Q9:
```
select count(*)
from store_sales
where ss_quantity between 1 and 20)
```
Test | zstd.level | block.size | block.rows | Elapsed Time
-- | -- | -- | -- | --
Test 1 | 0 | 128m | 100m | 40s
Test 2 | 3 | 128m | 100m | 35s
Test 3 | 3 | 256m | 100m | 25s
Test 4 | 3 | 512m | 100m | 16s
Test 5 | 3 | 512m | 400m | 17s
Test 6 | 3 | 512m | 800m | 17s
Metric | Test 1 | Test 2 | Test 3 | Test 4 | Test 5 | Test 6
-- | -- | -- | -- | -- | -- | --
Elapsed Time | 40s | 35s | 25s | 16s | 17s | 17s
Scan & Filter Time | 22.6m | 22.2m | 15.8m | 9.5m | 9.4m | 9.6m
IO Wait Time | 2.91h | 2.90h | 1.99h | 1.18h | 1.18h | 1.20h
Scan Time | 2.57h | 2.56h | 1.76h | 1.05h | 1.06h | 1.07h
Page Load Time | 20.6m | 20.3m | 13.7m | 7.4m | 7.4m | 7.6m
Data Source Read Time | 21.0m | 20.7m | 14.1m | 7.8m | 7.8m | 8.0m
Metric | Test 1 | Test 2 | Test 3 | Test 4 | Test 5 | Test 6
-- | -- | -- | -- | -- | -- | --
Storage Read Bytes | 5.9 GiB | 5.9 GiB | 5.9 GiB | 5.9 GiB | 5.9 GiB | 5.9
GiB
Size of Files Read | 256.6 GiB | 256.6 GiB | 245.1 GiB | 237.6 GiB | 237.6
GiB | 237.6 GiB
Peak Memory | 2.5 GiB | 2.5 GiB | 2.4 GiB | 2.8 GiB | 2.9 GiB | 2.9 GiB
Output Bytes | 40.1 GiB | 40.1 GiB | 40.1 GiB | 40.1 GiB | 40.1 GiB | 40.1
GiB
Metric | Test 1 | Test 2 | Test 3 | Test 4 | Test 5 | Test 6
-- | -- | -- | -- | -- | -- | --
Processed Row Groups | 8,727 | 8,727 | 4,894 | 2,677 | 2,677 | 2,677
Output Rows | 1,375,234,677 | 1,375,234,677 | 1,375,234,677 | 1,375,234,677
| 1,375,234,677 | 1,375,234,677
Raw Input Rows | 7,199,920,789 | 7,199,920,789 | 7,199,920,789 |
7,199,920,789 | 7,199,920,789 | 7,199,920,789
Output Vectors | 1,763,258 | 1,763,213 | 1,760,583 | 1,759,144 | 1,759,159 |
1,759,168
Memory Allocations | 3,617,435 | 3,617,334 | 3,594,017 | 3,579,321 |
3,579,441 | 3,579,406
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]