greenlaw commented on issue #13218: URL: https://github.com/apache/iceberg/issues/13218#issuecomment-3225610550
@royantman Unfortunately I'm not able to use your workaround (or at least not easily) because it _requires_ forking the library, as you mentioned - because `ParquetMetrics` is a private class. I really don't want to migrate existing code over to Go either. I agree the workflow needs to be well-defined. But I do believe the use case of "bring your own parquet files" is an important feature for Iceberg clients to support. It's not reasonable in many cases to require existing data to be passed through and rewritten by an iceberg driver, when the mapping from parquet file schema to table schema can simply be passed in at ingest time (as was possible with v1.8.x). I can confirm that rewriting my parquet files to include the [optional PARQUET:field_id metadata](https://arrow.apache.org/docs/cpp/parquet.html#parquet-field-id) for each column fixes the issue - but again it requires the data to be rewritten, rather than simply a lightweight footer scan + registration step. It also requires the parquet file itself to have awareness of the table schema. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org