liurenjie1024 commented on PR #1824:
URL: https://github.com/apache/iceberg-rust/pull/1824#issuecomment-3509095318

   > > Thanks @gbrgr for this pr. But I think we need to rethink how to compute 
the `_file`, `_pos` metadata column. While it's somehow trivial to compute 
`_file`, it's non trivial to compute `_pos` efficient, since when we read 
parquet files, we have filtered out some row groups. I think the best way is to 
push reading these two columns to arrow-rs.
   > 
   > @liurenjie1024 I agree for `_pos`, and we have a PR there: 
[apache/arrow-rs#8715](https://github.com/apache/arrow-rs/pull/8715) But 
`_file` seems like something that we don't need the arrow-rs to know about. 
Similarly, in future, for `_row_id` from V3 spec, we cannot expect arrow-rs to 
be responsible for computing that one.
   > 
   > How do we go forward with rethinking this, what would be the action items 
for us?
   
   Hi, @vustef I also agree that we should put `_file` in iceberg-rust, and I 
left some comments about how to proceed.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to