[GitHub] [iceberg] ayush-san opened a new issue #2327: Error while evolving partition column of a table

GitBox Thu, 11 Mar 2021 08:50:52 -0800


ayush-san opened a new issue #2327:
URL: https://github.com/apache/iceberg/issues/2327



   env:
   spar version: 3.0.1
   hive version: 3.1.2
   hadoop version: 3.2.1-amzn-2
   
   I was trying to do a small POC of the iceberg table partition evolution. 
Here the following set of commands I ran in spark-sql
   
   ```
   spark-sql --packages org.apache.iceberg:iceberg-spark3-runtime:0.11.0 \
       --conf 
spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions
 \
       --conf 
spark.sql.catalog.spark_catalog=org.apache.iceberg.spark.SparkSessionCatalog \
       --conf spark.sql.catalog.spark_catalog.type=hive \
       --conf spark.sql.catalog.hive=org.apache.iceberg.spark.SparkCatalog \
       --conf spark.sql.catalog.hive.type=hive \
       --conf spark.sql.catalog.hive.uri="thrift://localhost:9083"
   ```
   
   ```
   CREATE TABLE hive.db.tb (
       id bigint COMMENT 'id doc',
       ts timestamp COMMENT 'ts doc')
   USING iceberg
   PARTITIONED BY (years(ts))
   TBLPROPERTIES ('engine.hive.enabled'='true', 'write.format.default'='orc');
   
   INSERT INTO db.tb SELECT 1, to_timestamp("2020-07-06 13:40:00");
   INSERT INTO db.tb SELECT 2, to_timestamp("2020-07-06 14:30:00");
   INSERT INTO db.tb SELECT 3, to_timestamp("2020-07-06 15:20:00");
   
   select * from db.tb;
   ```
   
![image](https://user-images.githubusercontent.com/57655135/110821938-93303180-82b6-11eb-8757-1e87811da7c6.png)
   
   ```
   desc hive.db.tb;
   ```
   
   
![image](https://user-images.githubusercontent.com/57655135/110822075-b955d180-82b6-11eb-9daa-bb8cbaba45a3.png)
   
   Till now everything is working fine, here I decided to drop a partition and 
create a new partition on days(ts) instead of years(ts). After doing this I ran 
the `desc` statement again and got the following error.
   
   ```
   ALTER TABLE hive.db.tb DROP PARTITION FIELD years(ts);
   ALTER TABLE hive.db.tb ADD PARTITION FIELD days(ts);
   desc hive.db.tb;
   21/03/11 16:28:23 ERROR SparkSQLDriver: Failed in [desc hive.db.tb]
   java.lang.UnsupportedOperationException: Void transform is not supported
        at 
org.apache.iceberg.transforms.PartitionSpecVisitor.alwaysNull(PartitionSpecVisitor.java:86)
        at 
org.apache.iceberg.transforms.PartitionSpecVisitor.visit(PartitionSpecVisitor.java:148)
        at 
org.apache.iceberg.transforms.PartitionSpecVisitor.visit(PartitionSpecVisitor.java:120)
        at 
org.apache.iceberg.transforms.PartitionSpecVisitor.visit(PartitionSpecVisitor.java:102)
        at org.apache.iceberg.spark.Spark3Util.toTransforms(Spark3Util.java:236)
        at 
org.apache.iceberg.spark.source.SparkTable.partitioning(SparkTable.java:127)
        at 
org.apache.spark.sql.execution.datasources.v2.DescribeTableExec.addPartitioning(DescribeTableExec.scala:78)
        at 
org.apache.spark.sql.execution.datasources.v2.DescribeTableExec.run(DescribeTableExec.scala:41)
        at 
org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result$lzycompute(V2CommandExec.scala:39)
        at 
org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result(V2CommandExec.scala:39)
        at 
org.apache.spark.sql.execution.datasources.v2.V2CommandExec.executeCollect(V2CommandExec.scala:45)
        at 
org.apache.spark.sql.Dataset.$anonfun$logicalPlan$1(Dataset.scala:230)
        at 
org.apache.spark.sql.Dataset.$anonfun$withAction$1(Dataset.scala:3667)
        at 
org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:104)
        at 
org.apache.spark.sql.execution.SQLExecution$.withTracker(SQLExecution.scala:227)
        at 
org.apache.spark.sql.execution.SQLExecution$.executeQuery$1(SQLExecution.scala:107)
        at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$6(SQLExecution.scala:132)
        at 
org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:104)
        at 
org.apache.spark.sql.execution.SQLExecution$.withTracker(SQLExecution.scala:227)
        at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:132)
        at 
org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:248)
        at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:131)
        at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:764)
        at 
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:68)
        at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3665)
        at org.apache.spark.sql.Dataset.<init>(Dataset.scala:230)
        at org.apache.spark.sql.Dataset$.$anonfun$ofRows$2(Dataset.scala:101)
        at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:764)
        at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:98)
        at 
org.apache.spark.sql.SparkSession.$anonfun$sql$1(SparkSession.scala:607)
        at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:764)
        at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:602)
        at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:650)
        at 
org.apache.spark.sql.hive.thriftserver.SparkSQLDriver.run(SparkSQLDriver.scala:63)
        at 
org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.processCmd(SparkSQLCLIDriver.scala:377)
        at 
org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.$anonfun$processLine$1(SparkSQLCLIDriver.scala:496)
        at scala.collection.Iterator.foreach(Iterator.scala:941)
        at scala.collection.Iterator.foreach$(Iterator.scala:941)
        at scala.collection.AbstractIterator.foreach(Iterator.scala:1429)
        at scala.collection.IterableLike.foreach(IterableLike.scala:74)
        at scala.collection.IterableLike.foreach$(IterableLike.scala:73)
        at scala.collection.AbstractIterable.foreach(Iterable.scala:56)
        at 
org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.processLine(SparkSQLCLIDriver.scala:490)
        at 
org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:282)
        at 
org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at 
org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
        at 
org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:936)
        at 
org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
        at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
        at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
        at 
org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1015)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1024)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
   java.lang.UnsupportedOperationException: Void transform is not supported
        at 
org.apache.iceberg.transforms.PartitionSpecVisitor.alwaysNull(PartitionSpecVisitor.java:86)
        at 
org.apache.iceberg.transforms.PartitionSpecVisitor.visit(PartitionSpecVisitor.java:148)
        at 
org.apache.iceberg.transforms.PartitionSpecVisitor.visit(PartitionSpecVisitor.java:120)
        at 
org.apache.iceberg.transforms.PartitionSpecVisitor.visit(PartitionSpecVisitor.java:102)
        at org.apache.iceberg.spark.Spark3Util.toTransforms(Spark3Util.java:236)
        at 
org.apache.iceberg.spark.source.SparkTable.partitioning(SparkTable.java:127)
        at 
org.apache.spark.sql.execution.datasources.v2.DescribeTableExec.addPartitioning(DescribeTableExec.scala:78)
        at 
org.apache.spark.sql.execution.datasources.v2.DescribeTableExec.run(DescribeTableExec.scala:41)
        at 
org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result$lzycompute(V2CommandExec.scala:39)
        at 
org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result(V2CommandExec.scala:39)
        at 
org.apache.spark.sql.execution.datasources.v2.V2CommandExec.executeCollect(V2CommandExec.scala:45)
        at 
org.apache.spark.sql.Dataset.$anonfun$logicalPlan$1(Dataset.scala:230)
        at 
org.apache.spark.sql.Dataset.$anonfun$withAction$1(Dataset.scala:3667)
        at 
org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:104)
        at 
org.apache.spark.sql.execution.SQLExecution$.withTracker(SQLExecution.scala:227)
        at 
org.apache.spark.sql.execution.SQLExecution$.executeQuery$1(SQLExecution.scala:107)
        at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$6(SQLExecution.scala:132)
        at 
org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:104)
        at 
org.apache.spark.sql.execution.SQLExecution$.withTracker(SQLExecution.scala:227)
        at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:132)
        at 
org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:248)
        at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:131)
        at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:764)
        at 
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:68)
        at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3665)
        at org.apache.spark.sql.Dataset.<init>(Dataset.scala:230)
        at org.apache.spark.sql.Dataset$.$anonfun$ofRows$2(Dataset.scala:101)
        at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:764)
        at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:98)
        at 
org.apache.spark.sql.SparkSession.$anonfun$sql$1(SparkSession.scala:607)
        at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:764)
        at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:602)
        at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:650)
        at 
org.apache.spark.sql.hive.thriftserver.SparkSQLDriver.run(SparkSQLDriver.scala:63)
        at 
org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.processCmd(SparkSQLCLIDriver.scala:377)
        at 
org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.$anonfun$processLine$1(SparkSQLCLIDriver.scala:496)
        at scala.collection.Iterator.foreach(Iterator.scala:941)
        at scala.collection.Iterator.foreach$(Iterator.scala:941)
        at scala.collection.AbstractIterator.foreach(Iterator.scala:1429)
        at scala.collection.IterableLike.foreach(IterableLike.scala:74)
        at scala.collection.IterableLike.foreach$(IterableLike.scala:73)
        at scala.collection.AbstractIterable.foreach(Iterable.scala:56)
        at 
org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.processLine(SparkSQLCLIDriver.scala:490)
        at 
org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:282)
        at 
org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at 
org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
        at 
org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:936)
        at 
org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
        at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
        at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
        at 
org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1015)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1024)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
   ```
   
   I also tried running insert SQL statement on the table but got the same 
above error
   `INSERT INTO db.tb SELECT 4, to_timestamp("2020-07-06 16:10:00");`
   
   Also, I have got the exception `java.lang.IllegalStateException: Unknown 
type for long field. Type name: java.lang.Integer` on the following SQL 
statements
   ```
   select * from hive.db.tb.files;
   select * from hive.db.tb.manifests;
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] ayush-san opened a new issue #2327: Error while evolving partition column of a table

Reply via email to