4ertus2 opened a new issue, #10394:
URL: https://github.com/apache/incubator-gluten/issues/10394

   ### Backend
   
   VL (Velox)
   
   ### Bug description
   
   [Iceberg spec](https://iceberg.apache.org/spec/#parquet) requires field_ids 
are set:
   
   > Column IDs are required to be stored as [field 
IDs](http://github.com/apache/parquet-format/blob/40699d05bd24181de6b1457babbee2c16dce3803/src/main/thrift/parquet.thrift#L459)
 on the parquet schema.
   
   As I could understand the actual column ids from Iceberg schema are not 
passed here. So they cannot be written in Velox later.
   
   
https://github.com/apache/incubator-gluten/blob/2ec3ba751821d5e09a4da630c2b55e8a1a3ccb1b/cpp/velox/compute/VeloxRuntime.cc#L232
   
   It looks like it would be possible to pass the ids in Velox part through 
IcebergColumnHandle after this 
[PR](https://github.com/facebookincubator/velox/pull/14272/files#diff-9113b137842f261f316379c01b28852014c7d6ee19c0d74c0f9f757f5ed34ce3R23)
 
   
   Am I right that there's no info about actual Iceberg column_ids in 
`Java_org_apache_gluten_execution_IcebergWriteJniWrapper_init` right now?
   
   ### Gluten version
   
   main branch
   
   ### Spark version
   
   None
   
   ### Spark configurations
   
   _No response_
   
   ### System information
   
   _No response_
   
   ### Relevant logs
   
   ```bash
   
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to