ndrluis commented on PR #14027: URL: https://github.com/apache/iceberg/pull/14027#issuecomment-3289655448
Quick update on this issue - I'm going to focus on solving this problem on the Java side first. Once Iceberg Java has the correct behavior, I'll come back to PyIceberg and make the necessary adjustments. So here's the minimal test that I'm running using PySpark (since I have more familiarity with it than the Java environment). **Tested with the following Iceberg Runtimes**: org.apache.iceberg#iceberg-spark-runtime-3.5_2.12;1.9.0 org.apache.iceberg#iceberg-spark-runtime-3.5_2.12;1.10.0 **Test Case** ```python @pytest.mark.integration def test_uuid_write_read_with_pyspark(session_catalog: Catalog, spark: SparkSession) -> None: identifier = "default.test_uuid_write_and_read_with_pyspark" catalog = load_catalog("default", type="in-memory") catalog.create_namespace("ns") schema = Schema(NestedField(field_id=1, name="uuid_col", field_type=UUIDType(), required=False)) try: session_catalog.drop_table(identifier=identifier) except NoSuchTableError: pass table = _create_table(session_catalog, identifier, {"format-version": "2"}, schema=schema) spark.sql( f""" INSERT INTO {identifier} VALUES ("22222222-2222-2222-2222-222222222222") """ ) df = spark.table(identifier) assert df.count() == 1 result = df.where("uuid_col = '22222222-2222-2222-2222-222222222222'") assert result.count() == 1 ``` **Error** The test passes for df.count() but fails when applying the WHERE condition with the following error: ``` 25/09/14 12:45:49 ERROR BaseReader: Error reading file(s): s3://warehouse/default/test_uuid_write_and_read_with_pyspark/data/00000-0-c8b11c46-5ef7-426e-a1d5-de8aa720af6d-0-00001.parquet java.lang.ClassCastException: class java.util.UUID cannot be cast to class java.nio.ByteBuffer (java.util.UUID and java.nio.ByteBuffer are in module java.base of loader 'bootstrap') at java.base/java.nio.ByteBuffer.compareTo(ByteBuffer.java:267) at java.base/java.util.Comparators$NaturalOrderComparator.compare(Comparators.java:52) at java.base/java.util.Comparators$NaturalOrderComparator.compare(Comparators.java:47) at org.apache.iceberg.types.Comparators$NullSafeChainedComparator.compare(Comparators.java:253) at org.apache.iceberg.parquet.ParquetMetricsRowGroupFilter$MetricsEvalVisitor.eq(ParquetMetricsRowGroupFilter.java:352) at org.apache.iceberg.parquet.ParquetMetricsRowGroupFilter$MetricsEvalVisitor.eq(ParquetMetricsRowGroupFilter.java:79) at org.apache.iceberg.expressions.ExpressionVisitors$BoundExpressionVisitor.predicate(ExpressionVisitors.java:162) at org.apache.iceberg.expressions.ExpressionVisitors.visitEvaluator(ExpressionVisitors.java:390) at org.apache.iceberg.expressions.ExpressionVisitors.visitEvaluator(ExpressionVisitors.java:409) at org.apache.iceberg.parquet.ParquetMetricsRowGroupFilter$MetricsEvalVisitor.eval(ParquetMetricsRowGroupFilter.java:103) at org.apache.iceberg.parquet.ParquetMetricsRowGroupFilter.shouldRead(ParquetMetricsRowGroupFilter.java:73) at org.apache.iceberg.parquet.ReadConf.<init>(ReadConf.java:108) at org.apache.iceberg.parquet.VectorizedParquetReader.init(VectorizedParquetReader.java:90) at org.apache.iceberg.parquet.VectorizedParquetReader.iterator(VectorizedParquetReader.java:99) at org.apache.iceberg.spark.source.BatchDataReader.open(BatchDataReader.java:116) at org.apache.iceberg.spark.source.BatchDataReader.open(BatchDataReader.java:43) at org.apache.iceberg.spark.source.BaseReader.next(BaseReader.java:134) at org.apache.spark.sql.execution.datasources.v2.PartitionIterator.hasNext(DataSourceRDD.scala:120) at org.apache.spark.sql.execution.datasources.v2.MetricsIterator.hasNext(DataSourceRDD.scala:158) [... rest of stack trace ...] ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org