JoshuaZhuCN opened a new issue, #8695:
URL: https://github.com/apache/hudi/issues/8695
Failed to add fields in BUCKET index table
**To Reproduce**
Steps to reproduce the behavior:
1. create a bucket index table
```
create table if not exists `default`.hudi_table0 (
id int,
name string,
price double,
ts timestamp
) using hudi
options (
type = 'cow',
primaryKey = 'id',
preCombineField = 'ts',
`hoodie.index.type`='BUCKET',
`hoodie.index.bucket.engine`='SIMPLE',
`hoodie.bucket.index.num.buckets`='2',
`hoodie.bucket.index.hash.field`='id',
`hoodie.storage.layout.type`='BUCKET',
`hoodie.storage.layout.partitioner.class`='org.apache.hudi.table.action.commit.SparkBucketIndexPartitioner'
);
```
2. try to add a new column
```ALTER TABLE `default`.hudi_table0 ADD COLUMNS(`column1_added` string
comment 'first add column');```
**Expected behavior**
A clear and concise description of what you expected to happen.
**Environment Description**
* Hudi version : 0.12.1
* Spark version : 3.1.3
* Hive version : 3.1.0
* Hadoop version : 3.1.1
* Storage (HDFS/S3/GCS..) : HDFS
* Running on Docker? (yes/no) : no
**Stacktrace**
```
ERROR org.apache.spark.sql.hive.thriftserver.SparkSQLDriver - Failed in
[ALTER TABLE `default`.hudi_table0 ADD COLUMNS(`column1_added` string comment
'first add column')]
org.apache.hudi.exception.HoodieIndexException: Bucket index key (if
configured) must be subset of record key.
at
org.apache.hudi.config.HoodieIndexConfig$Builder.validateBucketIndexConfig(HoodieIndexConfig.java:647)
~[hudi-spark3.1.3-bundle_2.12-0.12.1.jar:0.12.1]
at
org.apache.hudi.config.HoodieIndexConfig$Builder.build(HoodieIndexConfig.java:615)
~[hudi-spark3.1.3-bundle_2.12-0.12.1.jar:0.12.1]
at
org.apache.hudi.config.HoodieWriteConfig$Builder.setDefaults(HoodieWriteConfig.java:2554)
~[hudi-spark3.1.3-bundle_2.12-0.12.1.jar:0.12.1]
at
org.apache.hudi.config.HoodieWriteConfig$Builder.build(HoodieWriteConfig.java:2679)
~[hudi-spark3.1.3-bundle_2.12-0.12.1.jar:0.12.1]
at
org.apache.hudi.DataSourceUtils.createHoodieConfig(DataSourceUtils.java:188)
~[hudi-spark3.1.3-bundle_2.12-0.12.1.jar:0.12.1]
at
org.apache.hudi.DataSourceUtils.createHoodieClient(DataSourceUtils.java:193)
~[hudi-spark3.1.3-bundle_2.12-0.12.1.jar:0.12.1]
at
org.apache.spark.sql.hudi.command.AlterHoodieTableAddColumnsCommand$.commitWithSchema(AlterHoodieTableAddColumnsCommand.scala:110)
~[hudi-spark3.1.3-bundle_2.12-0.12.1.jar:0.12.1]
at
org.apache.spark.sql.hudi.command.AlterHoodieTableAddColumnsCommand.run(AlterHoodieTableAddColumnsCommand.scala:66)
~[hudi-spark3.1.3-bundle_2.12-0.12.1.jar:0.12.1]
at
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
~[spark-sql_2.12-3.1.3.jar:3.1.3]
at
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
~[spark-sql_2.12-3.1.3.jar:3.1.3]
at
org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:79)
~[spark-sql_2.12-3.1.3.jar:3.1.3]
at
org.apache.spark.sql.Dataset.$anonfun$logicalPlan$1(Dataset.scala:228)
~[spark-sql_2.12-3.1.3.jar:3.1.3]
at
org.apache.spark.sql.Dataset.$anonfun$withAction$1(Dataset.scala:3700)
~[spark-sql_2.12-3.1.3.jar:3.1.3]
at
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:103)
~[spark-sql_2.12-3.1.3.jar:3.1.3]
at
org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:163)
~[spark-sql_2.12-3.1.3.jar:3.1.3]
at
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:90)
~[spark-sql_2.12-3.1.3.jar:3.1.3]
at
org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
~[spark-sql_2.12-3.1.3.jar:3.1.3]
at
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64)
~[spark-sql_2.12-3.1.3.jar:3.1.3]
at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3698)
~[spark-sql_2.12-3.1.3.jar:3.1.3]
at org.apache.spark.sql.Dataset.<init>(Dataset.scala:228)
~[spark-sql_2.12-3.1.3.jar:3.1.3]
at org.apache.spark.sql.Dataset$.$anonfun$ofRows$2(Dataset.scala:99)
~[spark-sql_2.12-3.1.3.jar:3.1.3]
at
org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
~[spark-sql_2.12-3.1.3.jar:3.1.3]
at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:96)
~[spark-sql_2.12-3.1.3.jar:3.1.3]
at
org.apache.spark.sql.SparkSession.$anonfun$sql$1(SparkSession.scala:618)
~[spark-sql_2.12-3.1.3.jar:3.1.3]
at
org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
~[spark-sql_2.12-3.1.3.jar:3.1.3]
at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:613)
~[spark-sql_2.12-3.1.3.jar:3.1.3]
at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:650)
~[spark-sql_2.12-3.1.3.jar:3.1.3]
at
org.apache.spark.sql.hive.thriftserver.SparkSQLDriver.run(SparkSQLDriver.scala:67)
~[spark-hive-thriftserver_2.12-3.1.3.jar:3.1.3]
at
org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.processCmd(SparkSQLCLIDriver.scala:381)
~[spark-hive-thriftserver_2.12-3.1.3.jar:3.1.3]
at
org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.$anonfun$processLine$1(SparkSQLCLIDriver.scala:500)
~[spark-hive-thriftserver_2.12-3.1.3.jar:3.1.3]
at
org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.$anonfun$processLine$1$adapted(SparkSQLCLIDriver.scala:494)
~[spark-hive-thriftserver_2.12-3.1.3.jar:3.1.3]
at scala.collection.Iterator.foreach(Iterator.scala:941)
[scala-library-2.12.10.jar:?]
at scala.collection.Iterator.foreach$(Iterator.scala:941)
[scala-library-2.12.10.jar:?]
at scala.collection.AbstractIterator.foreach(Iterator.scala:1429)
[scala-library-2.12.10.jar:?]
at scala.collection.IterableLike.foreach(IterableLike.scala:74)
[scala-library-2.12.10.jar:?]
at scala.collection.IterableLike.foreach$(IterableLike.scala:73)
[scala-library-2.12.10.jar:?]
at scala.collection.AbstractIterable.foreach(Iterable.scala:56)
[scala-library-2.12.10.jar:?]
at
org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.processLine(SparkSQLCLIDriver.scala:494)
[spark-hive-thriftserver_2.12-3.1.3.jar:3.1.3]
at
org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:284)
[spark-hive-thriftserver_2.12-3.1.3.jar:3.1.3]
at
org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala)
[spark-hive-thriftserver_2.12-3.1.3.jar:3.1.3]
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
~[?:1.8.0_271]
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
~[?:1.8.0_271]
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
~[?:1.8.0_271]
at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_271]
at
org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
[spark-core_2.12-3.1.3.jar:3.1.3]
at
org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:951)
[spark-core_2.12-3.1.3.jar:3.1.3]
at
org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
[spark-core_2.12-3.1.3.jar:3.1.3]
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
[spark-core_2.12-3.1.3.jar:3.1.3]
at
org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
[spark-core_2.12-3.1.3.jar:3.1.3]
at
org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1039)
[spark-core_2.12-3.1.3.jar:3.1.3]
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1048)
[spark-core_2.12-3.1.3.jar:3.1.3]
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
[spark-core_2.12-3.1.3.jar:3.1.3]
org.apache.hudi.exception.HoodieIndexException: Bucket index key (if
configured) must be subset of record key.
at
org.apache.hudi.config.HoodieIndexConfig$Builder.validateBucketIndexConfig(HoodieIndexConfig.java:647)
at
org.apache.hudi.config.HoodieIndexConfig$Builder.build(HoodieIndexConfig.java:615)
at
org.apache.hudi.config.HoodieWriteConfig$Builder.setDefaults(HoodieWriteConfig.java:2554)
at
org.apache.hudi.config.HoodieWriteConfig$Builder.build(HoodieWriteConfig.java:2679)
at
org.apache.hudi.DataSourceUtils.createHoodieConfig(DataSourceUtils.java:188)
at
org.apache.hudi.DataSourceUtils.createHoodieClient(DataSourceUtils.java:193)
at
org.apache.spark.sql.hudi.command.AlterHoodieTableAddColumnsCommand$.commitWithSchema(AlterHoodieTableAddColumnsCommand.scala:110)
at
org.apache.spark.sql.hudi.command.AlterHoodieTableAddColumnsCommand.run(AlterHoodieTableAddColumnsCommand.scala:66)
at
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
at
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
at
org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:79)
at
org.apache.spark.sql.Dataset.$anonfun$logicalPlan$1(Dataset.scala:228)
at
org.apache.spark.sql.Dataset.$anonfun$withAction$1(Dataset.scala:3700)
at
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:103)
at
org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:163)
at
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:90)
at
org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
at
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64)
at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3698)
at org.apache.spark.sql.Dataset.<init>(Dataset.scala:228)
at org.apache.spark.sql.Dataset$.$anonfun$ofRows$2(Dataset.scala:99)
at
org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:96)
at
org.apache.spark.sql.SparkSession.$anonfun$sql$1(SparkSession.scala:618)
at
org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:613)
at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:650)
at
org.apache.spark.sql.hive.thriftserver.SparkSQLDriver.run(SparkSQLDriver.scala:67)
at
org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.processCmd(SparkSQLCLIDriver.scala:381)
at
org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.$anonfun$processLine$1(SparkSQLCLIDriver.scala:500)
at
org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.$anonfun$processLine$1$adapted(SparkSQLCLIDriver.scala:494)
at scala.collection.Iterator.foreach(Iterator.scala:941)
at scala.collection.Iterator.foreach$(Iterator.scala:941)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1429)
at scala.collection.IterableLike.foreach(IterableLike.scala:74)
at scala.collection.IterableLike.foreach$(IterableLike.scala:73)
at scala.collection.AbstractIterable.foreach(Iterable.scala:56)
at
org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.processLine(SparkSQLCLIDriver.scala:494)
at
org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:284)
at
org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at
org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
at
org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:951)
at
org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
at
org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1039)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1048)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]