tustvold commented on issue #5995:
URL:
https://github.com/apache/arrow-datafusion/issues/5995#issuecomment-1507426189
I added the following line to `ParquetOpener::open`
```
println!(
"Parquet partition {} reading row groups {:?}",
partition_index, row_groups
);
```
And got
```
Parquet partition 1 reading row groups []
Parquet partition 2 reading row groups []
Parquet partition 0 reading row groups [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10,
11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30,
31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48]
Parquet partition 29 reading row groups []
Parquet partition 18 reading row groups []
Parquet partition 27 reading row groups []
Parquet partition 28 reading row groups []
Parquet partition 13 reading row groups []
Parquet partition 22 reading row groups []
Parquet partition 30 reading row groups []
Parquet partition 21 reading row groups []
Parquet partition 7 reading row groups []
Parquet partition 16 reading row groups []
Parquet partition 3 reading row groups []
Parquet partition 23 reading row groups []
Parquet partition 8 reading row groups []
Parquet partition 31 reading row groups []
Parquet partition 10 reading row groups []
Parquet partition 20 reading row groups []
Parquet partition 11 reading row groups []
Parquet partition 12 reading row groups []
Parquet partition 24 reading row groups []
Parquet partition 26 reading row groups []
Parquet partition 5 reading row groups []
Parquet partition 25 reading row groups []
Parquet partition 14 reading row groups []
Parquet partition 17 reading row groups []
Parquet partition 6 reading row groups []
Parquet partition 9 reading row groups []
Parquet partition 19 reading row groups []
Parquet partition 4 reading row groups []
Parquet partition 15 reading row groups []
```
So whilst it is creating lots of partitions, all the row group appear to lie
in a single partition. This explains why we are not seeing any parallelism. Why
this is the case needs more investigation
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]