alamb commented on PR #7307:
URL: https://github.com/apache/arrow-rs/pull/7307#issuecomment-3312598827

   > > row numbers are a pretty fundamental feature that's very hard to emulate 
in higher layers if the parquet reader doesn't support them
   > 
   > +1 on this painpoint, working around this lack of capability from a client 
perspective is very challenging and comes with a bunch of correctness risks 
(e.g. we can write some rowIndex column client-side, but then we have to be 
100% sure that the emitted rowIndex will perfectly match the Parquet files, 
which can get quite tricky especially in multi-threaded executions etc.)
   > 
   > Would love to this feature landing in Arrow-rs + Datafusion
   
   Thanks @16pierre  -- I agree there is no good workaround for adding row 
numbers to the output of the parquet reader
   
   I think the biggest thing we need to do is to sort out the API for "how does 
a user request the (virtual) row number column" as todays `ProjectionMask` is 
insufficient
   
   @scovich and @etseidl 's idea to use some sort of Arrow metadata is 
interesting, but I am not quite sure how it would look
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to