Hi,

I'm working on an issue in the ORC reader in https://github.com/rapidsai/cudf. 
This reader uses the row index to parallelize the reads of row groups on the 
GPU.
I've found that the issue stems from the unexpected order of row index streams. 
Namely, the order does not seem to match the order of corresponding data stream 
descriptors in the file footer.
In this specific case, file footer contains the LENGTH stream of a string 
column before its DATA stream. However, the row index streams seem to be stored 
in the opposite order.

So, my question is: what is the order of row index streams in an ORC file 
(within each column)? Is it fixed for the given TypeKind, or are they indeed 
ordered to correspond to the data stream order?

Thank you,
Vukasin

Reply via email to