alamb commented on PR #7307: URL: https://github.com/apache/arrow-rs/pull/7307#issuecomment-3312598827
> > row numbers are a pretty fundamental feature that's very hard to emulate in higher layers if the parquet reader doesn't support them > > +1 on this painpoint, working around this lack of capability from a client perspective is very challenging and comes with a bunch of correctness risks (e.g. we can write some rowIndex column client-side, but then we have to be 100% sure that the emitted rowIndex will perfectly match the Parquet files, which can get quite tricky especially in multi-threaded executions etc.) > > Would love to this feature landing in Arrow-rs + Datafusion Thanks @16pierre -- I agree there is no good workaround for adding row numbers to the output of the parquet reader I think the biggest thing we need to do is to sort out the API for "how does a user request the (virtual) row number column" as todays `ProjectionMask` is insufficient @scovich and @etseidl 's idea to use some sort of Arrow metadata is interesting, but I am not quite sure how it would look -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org