jinchengchenghh commented on issue #8948: URL: https://github.com/apache/incubator-gluten/issues/8948#issuecomment-2724825095
Delete scan reads the metadata column `_file`, the information is not in velox. https://github.com/apache/iceberg/blob/main/spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/source/BatchDataReader.java#L98 In iceberg, the data file reader create a constant column for the metadata columns, ``` public static Map<Integer, ?> constantsMap( ContentScanTask<?> task, Types.StructType partitionType, BiFunction<Type, Object, Object> convertConstant) { PartitionSpec spec = task.spec(); StructLike partitionData = task.file().partition(); // use java.util.HashMap because partition data may contain null values Map<Integer, Object> idToConstant = Maps.newHashMap(); // add _file idToConstant.put( MetadataColumns.FILE_PATH.fieldId(), convertConstant.apply(Types.StringType.get(), task.file().path())); // add _spec_id idToConstant.put( MetadataColumns.SPEC_ID.fieldId(), convertConstant.apply(Types.IntegerType.get(), task.file().specId())); // add _partition if (partitionType != null) { if (!partitionType.fields().isEmpty()) { StructLike coercedPartition = coercePartition(partitionType, spec, partitionData); idToConstant.put( MetadataColumns.PARTITION_COLUMN_ID, convertConstant.apply(partitionType, coercedPartition)); } else { // use null as some query engines may not be able to handle empty structs idToConstant.put(MetadataColumns.PARTITION_COLUMN_ID, null); } } ``` And the PARTITION_COLUMN_ID is also constant column. So we need to extract the metadata column, only query the data columns from velox and add the _delete or _file flag as extra constant Vector to data RowVector. _delete is used in equality delete file reader test. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
