4ertus2 opened a new issue, #10394: URL: https://github.com/apache/incubator-gluten/issues/10394
### Backend VL (Velox) ### Bug description [Iceberg spec](https://iceberg.apache.org/spec/#parquet) requires field_ids are set: > Column IDs are required to be stored as [field IDs](http://github.com/apache/parquet-format/blob/40699d05bd24181de6b1457babbee2c16dce3803/src/main/thrift/parquet.thrift#L459) on the parquet schema. As I could understand the actual column ids from Iceberg schema are not passed here. So they cannot be written in Velox later. https://github.com/apache/incubator-gluten/blob/2ec3ba751821d5e09a4da630c2b55e8a1a3ccb1b/cpp/velox/compute/VeloxRuntime.cc#L232 It looks like it would be possible to pass the ids in Velox part through IcebergColumnHandle after this [PR](https://github.com/facebookincubator/velox/pull/14272/files#diff-9113b137842f261f316379c01b28852014c7d6ee19c0d74c0f9f757f5ed34ce3R23) Am I right that there's no info about actual Iceberg column_ids in `Java_org_apache_gluten_execution_IcebergWriteJniWrapper_init` right now? ### Gluten version main branch ### Spark version None ### Spark configurations _No response_ ### System information _No response_ ### Relevant logs ```bash ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
