[I] [Bug] [spark hive connector] failed to create table is partition column is not at the end [kyuubi]

via GitHub Tue, 21 May 2024 03:47:57 -0700


FANNG1 opened a new issue, #6403:
URL: https://github.com/apache/kyuubi/issues/6403


   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of 
Conduct](https://www.apache.org/foundation/policies/conduct)
   
   
   ### Search before asking
   
   - [X] I have searched in the 
[issues](https://github.com/apache/kyuubi/issues?q=is%3Aissue) and found no 
similar issues.
   
   
   ### Describe the bug
   
   create table failed if the partition column is not last column. could be 
reproduced
   
   ```scala
   import org.apache.spark.sql.{Row, SparkSession}
   import org.apache.spark.sql.types.{DataTypes, StructField, StructType}
   import scala.collection.mutable.ListBuffer
   
       val schema = StructType(Array(
         StructField("name", DataTypes.StringType, nullable = false),
         StructField("favorite_color", DataTypes.StringType, nullable = false),
         StructField("favorite_numbers", DataTypes.StringType, nullable = false)
       ))
   
       val data = ListBuffer[Row]()
       data += Row("Alyssa", "blue", "1")
       data += Row("Ben", "red", "2")
   
       val usersDF = 
spark.createDataFrame(spark.sparkContext.parallelize(data), schema)
   
      // "favorite_color" is not last column
      usersDF.write.partitionBy("favorite_color").saveAsTable("users3")
   ```
   
   ### Affects Version(s)
   
   1.8.1
   
   ### Kyuubi Server Log Output
   
   _No response_
   
   ### Kyuubi Engine Log Output
   
   ```logtalk
   Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Exception when 
loading 2 in table users4 with 
loadPath=hdfs://localhost:9000/user/hive/warehouse/users4/.hive-staging_hive_2024-05-21_18-42-55_548_3233995390238788603-1/-ext-10000
        at 
org.apache.hadoop.hive.ql.metadata.Hive.loadDynamicPartitions(Hive.java:1970)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at 
org.apache.spark.sql.hive.client.Shim_v2_1.loadDynamicPartitions(HiveShim.scala:1605)
        at 
org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$loadDynamicPartitions$1(HiveClientImpl.scala:977)
        at 
scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
        at 
org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$withHiveState$1(HiveClientImpl.scala:303)
        at 
org.apache.spark.sql.hive.client.HiveClientImpl.liftedTree1$1(HiveClientImpl.scala:234)
        at 
org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:233)
        at 
org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:283)
        at 
org.apache.spark.sql.hive.client.HiveClientImpl.loadDynamicPartitions(HiveClientImpl.scala:968)
        at 
org.apache.spark.sql.hive.HiveExternalCatalog.$anonfun$loadDynamicPartitions$1(HiveExternalCatalog.scala:966)
        at 
scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
        at 
org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:101)
        ... 87 more
   Caused by: java.util.concurrent.ExecutionException: 
org.apache.hadoop.hive.ql.metadata.Table$ValidationFailureSemanticException: 
Partition spec {favorite_color=, favorite_numbers=2} contains non-partition 
columns
        at java.util.concurrent.FutureTask.report(FutureTask.java:122)
        at java.util.concurrent.FutureTask.get(FutureTask.java:192)
        at 
org.apache.hadoop.hive.ql.metadata.Hive.loadDynamicPartitions(Hive.java:1961)
        ... 102 more
   Caused by: 
org.apache.hadoop.hive.ql.metadata.Table$ValidationFailureSemanticException: 
Partition spec {favorite_color=, favorite_numbers=2} contains non-partition 
columns
        at 
org.apache.hadoop.hive.ql.metadata.Table.validatePartColumnNames(Table.java:384)
        at org.apache.hadoop.hive.ql.metadata.Hive.getPartition(Hive.java:2232)
        at org.apache.hadoop.hive.ql.metadata.Hive.getPartition(Hive.java:2188)
        at org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1618)
        at org.apache.hadoop.hive.ql.metadata.Hive$3.call(Hive.java:1929)
        at org.apache.hadoop.hive.ql.metadata.Hive$3.call(Hive.java:1920)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
   org.apache.spark.sql.AnalysisException: 
org.apache.hadoop.hive.ql.metadata.HiveException: Exception when loading 2 in 
table users4 with 
loadPath=hdfs://localhost:9000/user/hive/warehouse/users4/.hive-staging_hive_2024-05-21_18-42-55_548_3233995390238788603-1/-ext-10000
     at 
org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:110)
     at 
org.apache.spark.sql.hive.HiveExternalCatalog.loadDynamicPartitions(HiveExternalCatalog.scala:946)
     at 
org.apache.spark.sql.catalyst.catalog.ExternalCatalogWithListener.loadDynamicPartitions(ExternalCatalogWithListener.scala:189)
     at 
org.apache.kyuubi.spark.connector.hive.write.HiveBatchWrite.commitToMetastore(HiveBatchWrite.scala:188)
     at 
org.apache.kyuubi.spark.connector.hive.write.HiveBatchWrite.commit(HiveBatchWrite.scala:63)
     at 
org.apache.spark.sql.execution.datasources.v2.V2TableWriteExec.writeWithV2(WriteToDataSourceV2Exec.scala:422)
     at 
org.apache.spark.sql.execution.datasources.v2.V2TableWriteExec.writeWithV2$(WriteToDataSourceV2Exec.scala:382)
     at 
org.apache.spark.sql.execution.datasources.v2.CreateTableAsSelectExec.writeWithV2(WriteToDataSourceV2Exec.scala:68)
     at 
org.apache.spark.sql.execution.datasources.v2.TableWriteExecHelper.$anonfun$writeToTable$1(WriteToDataSourceV2Exec.scala:599)
     at 
org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1563)
     at 
org.apache.spark.sql.execution.datasources.v2.TableWriteExecHelper.writeToTable(WriteToDataSourceV2Exec.scala:587)
     at 
org.apache.spark.sql.execution.datasources.v2.TableWriteExecHelper.writeToTable$(WriteToDataSourceV2Exec.scala:582)
     at 
org.apache.spark.sql.execution.datasources.v2.CreateTableAsSelectExec.writeToTable(WriteToDataSourceV2Exec.scala:68)
     at 
org.apache.spark.sql.execution.datasources.v2.CreateTableAsSelectExec.run(WriteToDataSourceV2Exec.scala:93)
     at 
org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result$lzycompute(V2CommandExec.scala:43)
     at 
org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result(V2CommandExec.scala:43)
     at 
org.apache.spark.sql.execution.datasources.v2.V2CommandExec.executeCollect(V2CommandExec.scala:49)
     at 
org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.$anonfun$applyOrElse$1(QueryExecution.scala:98)
     at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$6(SQLExecution.scala:118)
     at 
org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:195)
     at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:103)
     at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:827)
     at 
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:65)
     at 
org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:98)
     at 
org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:94)
     at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:512)
     at 
org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:104)
     at 
org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:512)
     at 
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:31)
     at 
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning(AnalysisHelper.scala:267)
     at 
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning$(AnalysisHelper.scala:263)
     at 
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:31)
     at 
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:31)
     at 
org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:488)
     at 
org.apache.spark.sql.execution.QueryExecution.eagerlyExecuteCommands(QueryExecution.scala:94)
     at 
org.apache.spark.sql.execution.QueryExecution.commandExecuted$lzycompute(QueryExecution.scala:81)
     at 
org.apache.spark.sql.execution.QueryExecution.commandExecuted(QueryExecution.scala:79)
     at 
org.apache.spark.sql.execution.QueryExecution.assertCommandExecuted(QueryExecution.scala:133)
     at 
org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:856)
     at 
org.apache.spark.sql.DataFrameWriter.saveAsTable(DataFrameWriter.scala:633)
     at 
org.apache.spark.sql.DataFrameWriter.saveAsTable(DataFrameWriter.scala:563)
     ... 47 elided
   ```
   ```
   
   
   ### Kyuubi Server Configurations
   
   ```yaml
   no
   ```
   
   
   ### Kyuubi Engine Configurations
   
   ```yaml
   no
   ```
   
   
   ### Additional context
   
   the direct cause:
   ```
     private val allColumns = info.schema().toAttributes
     private val dataColumns = allColumns.take(allColumns.length - 
hiveTable.getPartCols.size())
     private val partColumns = 
allColumns.takeRight(hiveTable.getPartCols.size())
   ```
   
   ### Are you willing to submit PR?
   
   - [ ] Yes. I would be willing to submit a PR with guidance from the Kyuubi 
community to fix.
   - [X] No. I cannot submit a PR at this time.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscr...@kyuubi.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscr...@kyuubi.apache.org
For additional commands, e-mail: notifications-h...@kyuubi.apache.org

[I] [Bug] [spark hive connector] failed to create table is partition column is not at the end [kyuubi]

Reply via email to