LouisClt opened a new issue, #14673: URL: https://github.com/apache/arrow/issues/14673
Hello, I have the need to read only a subpart of the ORC file (say records 5000 to 6500 for instance). The goal is to do it in an efficient way reading only the stripes that contain the data (not the whole file). To do this I need to get the number of rows by stripe. Looking at the code, it seems this is known in the implementation but not reported back at the API level. See: https://github.com/apache/arrow/blob/b4a8320890c6658f948e025f522db5f125a1f8dc/cpp/src/arrow/adapters/orc/adapter.cc#L207-L212 Is that correct ? There is also a potential way of doing that by using the "seek()" method, and then using the "NextStripeReader" method. but this allows only reading one stripe if I am correct, and I also prefer going with the "ReadStripe(...)" method. If this is correct, I am willing to add a method to retrieve this. What can be its name ? NumerOfRows(int64_t stripe) ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
