LouisClt opened a new issue, #14673:
URL: https://github.com/apache/arrow/issues/14673

   Hello, I have the need to read only a subpart of the ORC file (say records 
5000 to 6500 for instance).
   The goal is to do it in an efficient way reading only the stripes that 
contain the data (not the whole file).
   To do this I need to get the number of rows by stripe.
   Looking at the code, it seems this is known in the implementation but not 
reported back at the API level.
   See: 
https://github.com/apache/arrow/blob/b4a8320890c6658f948e025f522db5f125a1f8dc/cpp/src/arrow/adapters/orc/adapter.cc#L207-L212
   Is that correct ?
   There is also a potential way of doing that by using the "seek()" method, 
and then using the "NextStripeReader" method. but this allows only reading one 
stripe if I am correct, and I also prefer going with the "ReadStripe(...)" 
method.
   If this is correct, I am willing to add a method to retrieve this. What can 
be its name ? NumerOfRows(int64_t stripe) ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to