szehon-ho opened a new issue, #5132: URL: https://github.com/apache/iceberg/issues/5132
From the discussion in https://github.com/apache/iceberg/pull/5113 with @huaxingao , I found this behavior: select * from table where table.struct_field = struct(10) > org.apache.spark.sql.AnalysisException: cannot resolve '(table.struct_field = struct(10))' due to data type mismatch: differing types in '(table.struct_field = struct(1))' (struct<nested:int> and struct<col1:int>).; line 1 pos 39; select * from table where table.struct_field in (struct(10)) ``` java.lang.IllegalArgumentException: Cannot create expression literal from org.apache.spark.sql.catalyst.expressions.GenericRowWithSchema: [1] at org.apache.iceberg.expressions.Literals.from(Literals.java:87) at org.apache.iceberg.expressions.UnboundPredicate.<init>(UnboundPredicate.java:40) at org.apache.iceberg.expressions.Expressions.equal(Expressions.java:175) at org.apache.iceberg.spark.SparkFilters.handleEqual(SparkFilters.java:239) at org.apache.iceberg.spark.SparkFilters.convert(SparkFilters.java:152) at org.apache.iceberg.spark.source.SparkScanBuilder.pushFilters(SparkScanBuilder.java:106) at org.apache.spark.sql.execution.datasources.v2.PushDownUtils$.pushFilters(PushDownUtils.scala:69) at org.apache.spark.sql.execution.datasources.v2.V2ScanRelationPushDown$$anonfun$pushDownFilters$1.applyOrElse(V2ScanRelationPushDown.scala:60) at org.apache.spark.sql.execution.datasources.v2.V2ScanRelationPushDown$$anonfun$pushDownFilters$1.applyOrElse(V2ScanRelationPushDown.scala:47) ``` But on Spark non-Iceberg table, I get: ``` spark.sql("select * from test_struct_non_iceberg where struct_field in(struct(10))").show +------------+ |struct_field| +------------+ | {10}| +------------+ scala> spark.sql("select * from test_struct_non_iceberg where struct_field = struct(10)").show +------------+ |struct_field| +------------+ | {10}| +------------+ ``` It's possible that Iceberg cannot handle these filters (as it does not collect metrics for anything other than primitive columns). So maybe we should not even push down the filters. There may also be other problem (the returned schema not matching) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
