fatemehp commented on issue #37559: URL: https://github.com/apache/arrow/issues/37559#issuecomment-1836501789
I added a comment to the doc, I think we should think more about the format of the ranges both in terms of memory and performance. I would consider analyzing the following options before making a decision: 1) A set of runs of 0s and 1s. These could be storage optimized/encoded and cheaply read. 2) Storing num_to_read/num_to_skip instead of from/to. If we limit the size of the runs, we could store them in uint16_t instead of the proposed int64_t for from/to. 3) A storage-optimized bitmap such as a RoaringBitmap. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
