[GitHub] [iceberg] abmo-x opened a new issue, #4718: Iceberg 0.13 with Spark 3.2 - list partitions result always need partition.date and partition.hour columns in the result

GitBox Fri, 06 May 2022 17:16:06 -0700


abmo-x opened a new issue, #4718:
URL: https://github.com/apache/iceberg/issues/4718


   Steps to reproduce:
   
   1) create a Iceberg table using Hive metastore with date, hour partition 
columns
   2) Insert few records
   3) try to list partitions for the table created in #1 step. 
   
   Select queries only succeed if date AND hour columns are used. If either 
column is skipped or both are skipped then query fails.
   
   This is not reproducible with spark 3.0 and Iceberg 0.11
   
   see queries below:
   
   `
   **spark-sql>** select partition from 
spark_catalog.monitoring.test.partitions;
   {"date":"2020-01-02","hour":"10"}
   {"date":"2020-01-02","hour":"00"}
   {"date":"2020-01-01","hour":"00"}
   {"date":"2020-01-01","hour":"10"}
   Time taken: 0.191 seconds, Fetched 4 row(s)
   
   
   **spark-sql>** select partition.date, partition.hour from 
spark_catalog.monitoring.test.partitions;
   2020-01-02   10
   2020-01-02   00
   2020-01-01   00
   2020-01-01   10
   Time taken: 0.198 seconds, Fetched 4 row(s)
   
   **spark-sql>** select file_count from 
spark_catalog.monitoring.test.partitions; 
   22/05/06 16:12:25 ERROR SparkSQLDriver: Failed in [select file_count from 
spark_catalog.monitoring.test.partitions]
   java.lang.IllegalArgumentException: Cannot find source column: partition.date
        at 
org.apache.iceberg.relocated.com.google.common.base.Preconditions.checkArgument(Preconditions.java:217)
 ~[ ]
        at 
org.apache.iceberg.PartitionSpec$Builder.findSourceColumn(PartitionSpec.java:374)
 ~[ ]
        at 
org.apache.iceberg.PartitionSpec$Builder.identity(PartitionSpec.java:379) ~[ ]
        at 
org.apache.iceberg.BaseMetadataTable.lambda$transformSpec$0(BaseMetadataTable.java:68)
 ~[ ]
        at 
org.apache.iceberg.relocated.com.google.common.collect.ImmutableList.forEach(ImmutableList.java:405)
 ~[ ]
        at 
org.apache.iceberg.BaseMetadataTable.transformSpec(BaseMetadataTable.java:68) 
~[ ]
        at 
org.apache.iceberg.PartitionsTable.planFiles(PartitionsTable.java:114) ~[ ]
        at 
org.apache.iceberg.PartitionsTable.partitions(PartitionsTable.java:97) ~[ ]
        at org.apache.iceberg.PartitionsTable.task(PartitionsTable.java:75) ~[ ]
        at 
org.apache.iceberg.PartitionsTable.access$300(PartitionsTable.java:35) ~[ ]
        at 
org.apache.iceberg.PartitionsTable$PartitionsScan.lambda$new$0(PartitionsTable.java:138)
 ~[ ]
        at 
org.apache.iceberg.StaticTableScan.planFiles(StaticTableScan.java:66) ~[ ]
        at org.apache.iceberg.BaseTableScan.planFiles(BaseTableScan.java:209) 
~[ ]
        at 
org.apache.iceberg.spark.source.SparkBatchQueryScan.files(SparkBatchQueryScan.java:179)
 ~[ ]
        at 
org.apache.iceberg.spark.source.SparkBatchQueryScan.tasks(SparkBatchQueryScan.java:193)
 ~[ ]
        at 
org.apache.iceberg.spark.source.SparkBatchScan.planInputPartitions(SparkBatchScan.java:144)
 ~[ ]
        at 
org.apache.spark.sql.execution.datasources.v2.BatchScanExec.partitions$lzycompute(BatchScanExec.scala:52)
 ~[ ]
        at 
org.apache.spark.sql.execution.datasources.v2.BatchScanExec.partitions(BatchScanExec.scala:52)
 ~[ ]
        at 
org.apache.spark.sql.execution.datasources.v2.DataSourceV2ScanExecBase.supportsColumnar(DataSourceV2ScanExecBase.scala:93)
 ~[ ]
        at 
org.apache.spark.sql.execution.datasources.v2.DataSourceV2ScanExecBase.supportsColumnar$(DataSourceV2ScanExecBase.scala:92)
 ~[ ]
        at 
org.apache.spark.sql.execution.datasources.v2.BatchScanExec.supportsColumnar(BatchScanExec.scala:35)
 ~[ ]
        at 
org.apache.spark.sql.execution.datasources.v2.DataSourceV2Strategy.apply(DataSourceV2Strategy.scala:124)
 ~[ ]
        at 
org.apache.spark.sql.catalyst.planning.QueryPlanner.$anonfun$plan$1(QueryPlanner.scala:63)
 ~[ ]
        at scala.collection.Iterator$$anon$11.nextCur(Iterator.scala:486) 
~[scala-library-2.12.15.jar:?]
        at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:492) 
~[scala-library-2.12.15.jar:?]
        at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:491) 
~[scala-library-2.12.15.jar:?]
        at 
org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:93) 
~[ ]
        at 
org.apache.spark.sql.execution.SparkStrategies.plan(SparkStrategies.scala:68) 
~[ ]
        at 
org.apache.spark.sql.catalyst.planning.QueryPlanner.$anonfun$plan$3(QueryPlanner.scala:78)
 ~[ ]
        at 
scala.collection.TraversableOnce$folder$1.apply(TraversableOnce.scala:196) 
~[scala-library-2.12.15.jar:?]
        at 
scala.collection.TraversableOnce$folder$1.apply(TraversableOnce.scala:194) 
~[scala-library-2.12.15.jar:?]
        at scala.collection.Iterator.foreach(Iterator.scala:943) 
~[scala-library-2.12.15.jar:?]
        at scala.collection.Iterator.foreach$(Iterator.scala:943) 
~[scala-library-2.12.15.jar:?]
        at scala.collection.AbstractIterator.foreach(Iterator.scala:1431) 
~[scala-library-2.12.15.jar:?]
        at scala.collection.TraversableOnce.foldLeft(TraversableOnce.scala:199) 
~[scala-library-2.12.15.jar:?]
        at 
scala.collection.TraversableOnce.foldLeft$(TraversableOnce.scala:192) 
~[scala-library-2.12.15.jar:?]
        at scala.collection.AbstractIterator.foldLeft(Iterator.scala:1431) 
~[scala-library-2.12.15.jar:?]
        at 
org.apache.spark.sql.catalyst.planning.QueryPlanner.$anonfun$plan$2(QueryPlanner.scala:75)
 ~[ ]
        at scala.collection.Iterator$$anon$11.nextCur(Iterator.scala:486) 
~[scala-library-2.12.15.jar:?]
        at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:492) 
~[scala-library-2.12.15.jar:?]
        at 
org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:93) 
~[ ]
        at 
org.apache.spark.sql.execution.SparkStrategies.plan(SparkStrategies.scala:68) 
~[ ]
        at 
org.apache.spark.sql.execution.QueryExecution$.createSparkPlan(QueryExecution.scala:470)
 ~[ ]
        at 
org.apache.spark.sql.execution.QueryExecution.$anonfun$sparkPlan$2(QueryExecution.scala:161)
 ~[ ]
        at 
org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:111)
 ~[ ]
        at 
org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$1(QueryExecution.scala:200)
 ~[ ]
        at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:776) 
~[ ]
        at 
org.apache.spark.sql.execution.QueryExecution.executePhase(QueryExecution.scala:200)
 ~[ ]
        at 
org.apache.spark.sql.execution.QueryExecution.$anonfun$sparkPlan$1(QueryExecution.scala:161)
 ~[ ]
        at 
org.apache.spark.sql.execution.QueryExecution.withCteMap(QueryExecution.scala:73)
 ~[ ]
        at 
org.apache.spark.sql.execution.QueryExecution.sparkPlan$lzycompute(QueryExecution.scala:154)
 ~[ ]
        at 
org.apache.spark.sql.execution.QueryExecution.sparkPlan(QueryExecution.scala:154)
 ~[ ]
        at 
org.apache.spark.sql.execution.QueryExecution.$anonfun$executedPlan$2(QueryExecution.scala:174)
 ~[ ]
        at 
org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:111)
 ~[ ]
        at 
org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$1(QueryExecution.scala:200)
 ~[ ]
        at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:776) 
~[ ]
        at 
org.apache.spark.sql.execution.QueryExecution.executePhase(QueryExecution.scala:200)
 ~[ ]
        at 
org.apache.spark.sql.execution.QueryExecution.$anonfun$executedPlan$1(QueryExecution.scala:174)
 ~[ ]
        at 
org.apache.spark.sql.execution.QueryExecution.withCteMap(QueryExecution.scala:73)
 ~[ ]
        at 
org.apache.spark.sql.execution.QueryExecution.executedPlan$lzycompute(QueryExecution.scala:167)
 ~[ ]
        at 
org.apache.spark.sql.execution.QueryExecution.executedPlan(QueryExecution.scala:167)
 ~[ ]
        at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:101)
 ~[ ]
        at 
org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:163)
 ~[ ]
        at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:90)
 ~[ ]
        at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:776) 
~[ ]
        at 
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64)
 ~[ ]
        at 
org.apache.spark.sql.hive.thriftserver.SparkSQLDriver.run(SparkSQLDriver.scala:69)
 ~[ ]
        at 
org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.processCmd(SparkSQLCLIDriver.scala:383)
 ~[ ]
        at 
org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.$anonfun$processLine$1(SparkSQLCLIDriver.scala:503)
 ~[ ]
        at 
org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.$anonfun$processLine$1$adapted(SparkSQLCLIDriver.scala:497)
 ~[ ]
        at scala.collection.Iterator.foreach(Iterator.scala:943) 
[scala-library-2.12.15.jar:?]
        at scala.collection.Iterator.foreach$(Iterator.scala:943) 
[scala-library-2.12.15.jar:?]
        at scala.collection.AbstractIterator.foreach(Iterator.scala:1431) 
[scala-library-2.12.15.jar:?]
        at scala.collection.IterableLike.foreach(IterableLike.scala:74) 
[scala-library-2.12.15.jar:?]
        at scala.collection.IterableLike.foreach$(IterableLike.scala:73) 
[scala-library-2.12.15.jar:?]
        at scala.collection.AbstractIterable.foreach(Iterable.scala:56) 
[scala-library-2.12.15.jar:?]`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] abmo-x opened a new issue, #4718: Iceberg 0.13 with Spark 3.2 - list partitions result always need partition.date and partition.hour columns in the result

Reply via email to