greenlaw commented on issue #13218:
URL: https://github.com/apache/iceberg/issues/13218#issuecomment-3225610550

   @royantman Unfortunately I'm not able to use your workaround (or at least 
not easily) because it _requires_ forking the library, as you mentioned - 
because `ParquetMetrics` is a private class.  I really don't want to migrate 
existing code over to Go either.
   
   I agree the workflow needs to be well-defined.  But I do believe the use 
case of "bring your own parquet files" is an important feature for Iceberg 
clients to support.  It's not reasonable in many cases to require existing data 
to be passed through and rewritten by an iceberg driver, when the mapping from 
parquet file schema to table schema can simply be passed in at ingest time (as 
was possible with v1.8.x).
   
   I can confirm that rewriting my parquet files to include the [optional 
PARQUET:field_id 
metadata](https://arrow.apache.org/docs/cpp/parquet.html#parquet-field-id) for 
each column fixes the issue - but again it requires the data to be rewritten, 
rather than simply a lightweight footer scan + registration step. It also 
requires the parquet file itself to have awareness of the table schema.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to