Hi Arrow Team, I wanted to suggest an improvement regarding Acero's Scan node. Currently, it provides useful information such as __fragment_index, __batch_index, __filename, and __last_in_fragment. However, it would be beneficial to have an additional column that returns an overall "row index" from the source.
The row index would start from zero and increment for each row retrieved from the source, particularly in the case of Parquet files. Is it currently possible to obtain this row index or would expanding the Scan node's behavior be required? Having this row index column would be valuable in implementing support for Iceberg's positional-based delete files, as outlined in the following link: https://iceberg.apache.org/spec/#delete-formats While Iceberg's value-based deletes can already be performed using the support for anti joins, using a projection node does not guarantee the row ordering within an Acero graph. Hence, the inclusion of a dedicated row index column would provide a more reliable solution in this context. Thank you for considering this suggestion. Rusty