xqy179 opened a new issue #2234:
URL: https://github.com/apache/hudi/issues/2234


   **Describe the problem you faced**
   
   I write a hudi table with spark datasource api!  The table contains the 
flowing three fields "year,month,day", I use "year,month,day" as  partition 
keys, and partition value extractor uses 
"org.apache.hudi.hive.MultiPartKeysValueExtractor", config 
HIVE_SYNC_ENABLED_OPT_KEY to be "true"  !  After I finish writing data to 
storage[hdfs/s3],  Hudi will sync the table partitions to hive table 
automatically! The problem is happened  in the HiveSync processing!   By the 
way, my hudi version is 0.6.1!
   
   
   
   **Additional context**
   After I trace the code! I think this is a bug in the follow 
**'syncPartitions'** code. eg, the hive/hudi table have contain partition 
'year=2020/month=01/day=05',  when you write a new partition 
'year=2020/month=05/day=01' to the table,  it will throw error!  Because  in 
the **getPartitionEvents** method logic,  'year=2020/month=01/day=05' will 
transfer to be "01, 05, 2020" and 'year=2020/month=05/day=01'  will transfer to 
be "01, 05, 2020" too, So the new partition 'year=2020/month=05/day=01' is 
treated as a update event actually it is a new partition,  and the flowing 
processing will  update the table partitions using **"ALTER TABLE `orders` 
PARTITION (`year`='2020',`month`='11',`day`='01') SET LOCATION ..."**!  The 
update a not existed partition will throw error!
   
   
   I think I can comment the flowing two line codes and not affect other 
features to fix the error, Can I ?
   ```
     List<PartitionEvent>  getPartitionEvents()
   ...
   //Collections.sort(hivePartitionValues);
   ...
   //Collections.sort(storagePartitionValues);
   ...
   ```
   
   ```
   /**
      * Iterate over the storage partitions and find if there are any new 
partitions that need to be added or updated.
      * Generate a list of PartitionEvent based on the changes required.
      */
     List<PartitionEvent> getPartitionEvents(List<Partition> tablePartitions, 
List<String> partitionStoragePartitions) {
       Map<String, String> paths = new HashMap<>();
       for (Partition tablePartition : tablePartitions) {
         List<String> hivePartitionValues = tablePartition.getValues();
         Collections.sort(hivePartitionValues);  //**Maybe there is a bug 
Here!**
         String fullTablePartitionPath =
             Path.getPathWithoutSchemeAndAuthority(new 
Path(tablePartition.getSd().getLocation())).toUri().getPath();
         paths.put(String.join(", ", hivePartitionValues), 
fullTablePartitionPath);
       }
   
       List<PartitionEvent> events = new ArrayList<>();
       for (String storagePartition : partitionStoragePartitions) {
         Path storagePartitionPath = 
FSUtils.getPartitionPath(syncConfig.basePath, storagePartition);
         String fullStoragePartitionPath = 
Path.getPathWithoutSchemeAndAuthority(storagePartitionPath).toUri().getPath();
         // Check if the partition values or if hdfs path is the same
         List<String> storagePartitionValues = 
partitionValueExtractor.extractPartitionValuesInPath(storagePartition);
         Collections.sort(storagePartitionValues);//**Maybe there is a bug 
Here!**
   
         if (!storagePartitionValues.isEmpty()) {
           String storageValue = String.join(", ", storagePartitionValues);
           if (!paths.containsKey(storageValue)) {
             events.add(PartitionEvent.newPartitionAddEvent(storagePartition));
           } else if 
(!paths.get(storageValue).equals(fullStoragePartitionPath)) {
             
events.add(PartitionEvent.newPartitionUpdateEvent(storagePartition));
           }
         }
       }
       return events;
     }
   ```
   
   
   
   **Stacktrace**
   
   ```20/11/05 17:56:17 ERROR HiveSyncTool: Got runtime exception when hive 
syncing
   org.apache.hudi.hive.HoodieHiveSyncException: Failed to sync partitions for 
table  xtable
        at 
org.apache.hudi.hive.HiveSyncTool.syncPartitions(HiveSyncTool.java:187)
        at 
org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:126)
        at 
org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:87)
        at 
org.apache.hudi.HoodieSparkSqlWriter$.syncHive(HoodieSparkSqlWriter.scala:228)
        at 
org.apache.hudi.HoodieSparkSqlWriter$.checkWriteStatus(HoodieSparkSqlWriter.scala:278)
        at 
org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:183)
        at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:91)
        at 
org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:45)
        at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
        at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
        at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:86)
        at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
        at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
        at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:156)
        at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
        at 
org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
        at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
        at 
org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:80)
        at 
org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:80)
        at 
org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676)
        at 
org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676)
        at 
org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:78)
        at 
org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:125)
        at 
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:73)
        at 
org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:676)
        at 
org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:285)
        at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:271)
        at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:229)
        at 
com.pupu.bigdata.wrangling.utils.HudiUtils$.write_hudi_table(HudiUtils.scala:89)
        at 
com.pupu.bigdata.wrangling.ods.StageToOdsHudi$.writeDataToHudiTbl(StageToOdsHudi.scala:356)
        at 
com.pupu.bigdata.wrangling.ods.StageToOdsHudi$.execute_cur_hour_etl(StageToOdsHudi.scala:264)
        at 
com.pupu.bigdata.wrangling.ods.StageToOdsHudi$$anonfun$main$1.apply(StageToOdsHudi.scala:96)
        at 
com.pupu.bigdata.wrangling.ods.StageToOdsHudi$$anonfun$main$1.apply(StageToOdsHudi.scala:84)
        at 
scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
        at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
        at 
com.pupu.bigdata.wrangling.ods.StageToOdsHudi$.main(StageToOdsHudi.scala:84)
        at 
com.pupu.bigdata.wrangling.ods.StageToOdsHudi.main(StageToOdsHudi.scala)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at 
org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:684)
   Caused by: org.apache.hudi.hive.HoodieHiveSyncException: Failed in executing 
SQL ALTER TABLE ` xtable` PARTITION (`year`='2020',`month`='11',`day`='01') SET 
LOCATION 's3://*/year=2020/month=11/day=01'
        at 
org.apache.hudi.hive.HoodieHiveClient.updateHiveSQL(HoodieHiveClient.java:488)
        at 
org.apache.hudi.hive.HoodieHiveClient.updatePartitionsToTable(HoodieHiveClient.java:160)
        at 
org.apache.hudi.hive.HiveSyncTool.syncPartitions(HiveSyncTool.java:185)
        ... 41 more
   Caused by: org.apache.hive.service.cli.HiveSQLException: Error while 
compiling statement: FAILED: SemanticException [Error 10006]: Partition not 
found {year=2020, month=11, day=01}
        at org.apache.hive.jdbc.Utils.verifySuccess(Utils.java:256)
        at org.apache.hive.jdbc.Utils.verifySuccessWithInfo(Utils.java:242)
        at org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:254)
        at 
org.apache.hudi.hive.HoodieHiveClient.updateHiveSQL(HoodieHiveClient.java:486)
        ... 43 more
   Caused by: org.apache.hive.service.cli.HiveSQLException: Error while 
compiling statement: FAILED: SemanticException [Error 10006]: Partition not 
found {year=2020, month=11, day=01}
        at 
org.apache.hive.service.cli.operation.Operation.toSQLException(Operation.java:380)
        at 
org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:206)
        at 
org.apache.hive.service.cli.operation.SQLOperation.runInternal(SQLOperation.java:290)
        at 
org.apache.hive.service.cli.operation.Operation.run(Operation.java:320)
        at 
org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:530)
        at 
org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementAsync(HiveSessionImpl.java:517)
        at sun.reflect.GeneratedMethodAccessor49.invoke(Unknown Source)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at 
org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:78)
        at 
org.apache.hive.service.cli.session.HiveSessionProxy.access$000(HiveSessionProxy.java:36)
        at 
org.apache.hive.service.cli.session.HiveSessionProxy$1.run(HiveSessionProxy.java:63)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1844)
        at 
org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:59)
        at com.sun.proxy.$Proxy37.executeStatementAsync(Unknown Source)
        at 
org.apache.hive.service.cli.CLIService.executeStatementAsync(CLIService.java:310)
        at 
org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:530)
        at 
org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1437)
        at 
org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1422)
        at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
        at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
        at 
org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:56)
        at 
org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
   Caused by: org.apache.hadoop.hive.ql.parse.SemanticException: Partition not 
found {year=2020, month=11, day=01}
        at 
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.getPartition(BaseSemanticAnalyzer.java:1736)
        at 
org.apache.hadoop.hive.ql.parse.DDLSemanticAnalyzer.addInputsOutputsAlterTable(DDLSemanticAnalyzer.java:1515)
        at 
org.apache.hadoop.hive.ql.parse.DDLSemanticAnalyzer.addInputsOutputsAlterTable(DDLSemanticAnalyzer.java:1479)
        at 
org.apache.hadoop.hive.ql.parse.DDLSemanticAnalyzer.analyzeAlterTableLocation(DDLSemanticAnalyzer.java:1567)
        at 
org.apache.hadoop.hive.ql.parse.DDLSemanticAnalyzer.analyzeInternal(DDLSemanticAnalyzer.java:303)
        at 
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:258)
        at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:512)
        at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1317)
        at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1295)
        at 
org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:204)
        ... 26 more```
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to