GitHub user alamb added a comment to the discussion: How does 'sort' interact with record batches?
Sorry I just re-read this > My goal is that I will have a fully sorted file sorted by primary key where > each fileRowNumber is the index of that row in the file. I am not sure you will be able to do this today in DataFusion with a table that has multiple files as I don't think there is any way to tell DataFusion to keep the data segregated by file. You could probably do it by scanning each file individually There has also been talk of adding more "metadata" columns to listing tables, for example this one from @phillipleblanc - https://github.com/apache/datafusion/pull/15181 I don't think that PR alone would get you row numbers within a file, but it is the way I think we would need to do it. GitHub link: https://github.com/apache/datafusion/discussions/15711#discussioncomment-12979861 ---- This is an automatically sent email for github@datafusion.apache.org. To unsubscribe, please send an email to: github-unsubscr...@datafusion.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org