GitHub user alamb added a comment to the discussion: How does 'sort' interact 
with record batches?

Sorry I just re-read this

> My goal is that I will have a fully sorted file sorted by primary key where 
> each fileRowNumber is the index of that row in the file.

I am not sure you will be able to do this today in DataFusion with a table that 
has multiple files as I don't think there is any way to tell DataFusion to keep 
the data segregated by file. 

You could probably do it by scanning each file individually

There has also been talk of adding more "metadata" columns to listing tables,  
for example this one from @phillipleblanc 
- https://github.com/apache/datafusion/pull/15181

I don't think that PR alone would get you row numbers within a file, but it is 
the way I think we would need to do it.

GitHub link: 
https://github.com/apache/datafusion/discussions/15711#discussioncomment-12979861

----
This is an automatically sent email for github@datafusion.apache.org.
To unsubscribe, please send an email to: 
github-unsubscr...@datafusion.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to