emkornfield commented on issue #37559:
URL: https://github.com/apache/arrow/issues/37559#issuecomment-1828913912

   > > Finally I have got some time to complete the design doc drafted by 
@mapleFU: 
https://docs.google.com/document/d/1SeVcYudu6uD9rb9zRAnlLGgdauutaNZlAaS0gVzjkgM/.
   > 
   > This proposes a number of reader APIs based on row ranges, but never says 
how row ranges are computed in the first place?
   
   @pitrou I think the intent is that it is specifically abstract.  I think 
there are a few different methods to produce the ranges:
   1.  RowGroup level selection
   2. Page level selection via indices.
   3. "deletion vectors" from OSS table formats like Delta Lake and iceberg 
which specify which rows in the file are logically deleted.
   4. Using something like Arrow compute to select specific rows based off a 
few columns and construct the ranges.
   
   @wgtmac I think this is partially covered in the design but it would be good 
to maybe make this more explicit


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to