emkornfield commented on issue #37559: URL: https://github.com/apache/arrow/issues/37559#issuecomment-1828913912
> > Finally I have got some time to complete the design doc drafted by @mapleFU: https://docs.google.com/document/d/1SeVcYudu6uD9rb9zRAnlLGgdauutaNZlAaS0gVzjkgM/. > > This proposes a number of reader APIs based on row ranges, but never says how row ranges are computed in the first place? @pitrou I think the intent is that it is specifically abstract. I think there are a few different methods to produce the ranges: 1. RowGroup level selection 2. Page level selection via indices. 3. "deletion vectors" from OSS table formats like Delta Lake and iceberg which specify which rows in the file are logically deleted. 4. Using something like Arrow compute to select specific rows based off a few columns and construct the ranges. @wgtmac I think this is partially covered in the design but it would be good to maybe make this more explicit -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
