[ 
https://issues.apache.org/jira/browse/HUDI-4966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo updated HUDI-4966:
----------------------------
    Description: 
For Deltastreamer, when using TimestampBasedKeyGenerator with the output format 
of partition path containing slashes, e.g., "yyyy/MM/dd", and hive-style 
partitioning disabled (by default), the meta sync fails.
{code:java}
--hoodie-conf hoodie.datasource.write.partitionpath.field=createdDate
--hoodie-conf 
hoodie.datasource.write.keygenerator.class=org.apache.hudi.keygen.TimestampBasedKeyGenerator
--hoodie-conf hoodie.deltastreamer.keygen.timebased.timezone=GMT
--hoodie-conf hoodie.deltastreamer.keygen.timebased.output.dateformat=yyyy/MM/dd
--hoodie-conf 
hoodie.deltastreamer.keygen.timebased.timestamp.type=EPOCHMILLISECONDS {code}
Hive Sync exception:
{code:java}
Exception in thread "main" org.apache.hudi.exception.HoodieException: Could not 
sync using the meta sync class org.apache.hudi.hive.HiveSyncTool
    at 
org.apache.hudi.sync.common.util.SyncUtilHelpers.runHoodieMetaSync(SyncUtilHelpers.java:58)
    at 
org.apache.hudi.utilities.deltastreamer.DeltaSync.runMetaSync(DeltaSync.java:719)
    at 
org.apache.hudi.utilities.deltastreamer.DeltaSync.writeToSink(DeltaSync.java:637)
    at 
org.apache.hudi.utilities.deltastreamer.DeltaSync.syncOnce(DeltaSync.java:337)
    at 
org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.lambda$sync$2(HoodieDeltaStreamer.java:204)
    at org.apache.hudi.common.util.Option.ifPresent(Option.java:97)
    at 
org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.sync(HoodieDeltaStreamer.java:202)
    at 
org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.main(HoodieDeltaStreamer.java:571)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at 
org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
    at 
org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:955)
    at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
    at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
    at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
    at 
org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1043)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1052)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: org.apache.hudi.exception.HoodieException: Got runtime exception 
when hive syncing test_table
    at org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:145)
    at 
org.apache.hudi.sync.common.util.SyncUtilHelpers.runHoodieMetaSync(SyncUtilHelpers.java:56)
    ... 19 more
Caused by: org.apache.hudi.hive.HoodieHiveSyncException: Failed to sync 
partitions for table test_table
    at org.apache.hudi.hive.HiveSyncTool.syncPartitions(HiveSyncTool.java:341)
    at org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:232)
    at org.apache.hudi.hive.HiveSyncTool.doSync(HiveSyncTool.java:154)
    at org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:142)
    ... 20 more
Caused by: org.apache.hudi.hive.HoodieHiveSyncException: default.test_table add 
partition failed
    at 
org.apache.hudi.hive.ddl.HMSDDLExecutor.addPartitionsToTable(HMSDDLExecutor.java:217)
    at 
org.apache.hudi.hive.HoodieHiveSyncClient.addPartitionsToTable(HoodieHiveSyncClient.java:107)
    at org.apache.hudi.hive.HiveSyncTool.syncPartitions(HiveSyncTool.java:324)
    ... 23 more
Caused by: MetaException(message:Invalid partition key & values; keys 
[createddate, ], values [2022, 10, 02, ])
    at 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$add_partitions_req_result$add_partitions_req_resultStandardScheme.read(ThriftHiveMetastore.java)
    at 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$add_partitions_req_result$add_partitions_req_resultStandardScheme.read(ThriftHiveMetastore.java)
    at 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$add_partitions_req_result.read(ThriftHiveMetastore.java)
    at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:88)
    at 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_add_partitions_req(ThriftHiveMetastore.java:1911)
    at 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.add_partitions_req(ThriftHiveMetastore.java:1898)
    at 
org.apache.hadoop.hive.metastore.HiveMetaStoreClient.add_partitions(HiveMetaStoreClient.java:625)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at 
org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:173)
    at com.sun.proxy.$Proxy44.add_partitions(Unknown Source)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at 
org.apache.hadoop.hive.metastore.HiveMetaStoreClient$SynchronizedHandler.invoke(HiveMetaStoreClient.java:2327)
    at com.sun.proxy.$Proxy44.add_partitions(Unknown Source)
    at 
org.apache.hudi.hive.ddl.HMSDDLExecutor.addPartitionsToTable(HMSDDLExecutor.java:212)
    ... 25 more{code}
Glue Sync exception:
{code:java}
Exception in thread "main" org.apache.hudi.exception.HoodieException: Could not 
sync using the meta sync class org.apache.hudi.aws.sync.AwsGlueCatalogSyncTool
        at 
org.apache.hudi.sync.common.util.SyncUtilHelpers.runHoodieMetaSync(SyncUtilHelpers.java:58)
        at 
org.apache.hudi.utilities.deltastreamer.DeltaSync.runMetaSync(DeltaSync.java:719)
        at 
org.apache.hudi.utilities.deltastreamer.DeltaSync.writeToSink(DeltaSync.java:637)
        at 
org.apache.hudi.utilities.deltastreamer.DeltaSync.syncOnce(DeltaSync.java:337)
        at 
org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.lambda$sync$2(HoodieDeltaStreamer.java:204)
        at org.apache.hudi.common.util.Option.ifPresent(Option.java:97)
        at 
org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.sync(HoodieDeltaStreamer.java:202)
        at 
org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.main(HoodieDeltaStreamer.java:571)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at 
org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
        at 
org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:955)
        at 
org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
        at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
        at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
        at 
org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1043)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1052)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: org.apache.hudi.exception.HoodieException: Got runtime exception 
when hive syncing test_table
        at 
org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:145)
        at 
org.apache.hudi.sync.common.util.SyncUtilHelpers.runHoodieMetaSync(SyncUtilHelpers.java:56)
        ... 19 more
Caused by: org.apache.hudi.hive.HoodieHiveSyncException: Failed to sync 
partitions for table test_table
        at 
org.apache.hudi.hive.HiveSyncTool.syncPartitions(HiveSyncTool.java:341)
        at 
org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:232)
        at org.apache.hudi.hive.HiveSyncTool.doSync(HiveSyncTool.java:154)
        at 
org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:142)
        ... 20 more
Caused by: org.apache.hudi.aws.sync.HoodieGlueSyncException: Fail to add 
partitions to default.test_table
        at 
org.apache.hudi.aws.sync.AWSGlueCatalogSyncClient.addPartitionsToTable(AWSGlueCatalogSyncClient.java:147)
        at 
org.apache.hudi.hive.HiveSyncTool.syncPartitions(HiveSyncTool.java:324)
        ... 23 more
Caused by: 
org.apache.hudi.com.amazonaws.services.glue.model.InvalidInputException: The 
number of partition keys do not match the number of partition values (Service: 
AWSGlue; Status Code: 400; Error Code: InvalidInputException; Request ID: 
e8d9adf2-13c4-4589-bbec-c578a827749f; Proxy: null)
        at 
org.apache.hudi.com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleErrorResponse(AmazonHttpClient.java:1819)
        at 
org.apache.hudi.com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleServiceErrorResponse(AmazonHttpClient.java:1403)
        at 
org.apache.hudi.com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1372)
        at 
org.apache.hudi.com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1145)
        at 
org.apache.hudi.com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:802)
        at 
org.apache.hudi.com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:770)
        at 
org.apache.hudi.com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:744)
        at 
org.apache.hudi.com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:704)
        at 
org.apache.hudi.com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:686)
        at 
org.apache.hudi.com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:550)
        at 
org.apache.hudi.com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:530)
        at 
org.apache.hudi.com.amazonaws.services.glue.AWSGlueClient.doInvoke(AWSGlueClient.java:10640)
        at 
org.apache.hudi.com.amazonaws.services.glue.AWSGlueClient.invoke(AWSGlueClient.java:10607)
        at 
org.apache.hudi.com.amazonaws.services.glue.AWSGlueClient.invoke(AWSGlueClient.java:10596)
        at 
org.apache.hudi.com.amazonaws.services.glue.AWSGlueClient.executeBatchCreatePartition(AWSGlueClient.java:259)
        at 
org.apache.hudi.com.amazonaws.services.glue.AWSGlueClient.batchCreatePartition(AWSGlueClient.java:228)
        at 
org.apache.hudi.aws.sync.AWSGlueCatalogSyncClient.addPartitionsToTable(AWSGlueCatalogSyncClient.java:139)
        ... 24 more {code}
The exception is thrown because the partition path values for meta sync are not 
properly extracted.  "hoodie.datasource.hive_sync.partition_extractor_class" 
determines the partition extractor to use and in such a case, the 
`MultiPartKeysValueExtractor` is inferred to be used.  The root cause is that, 
this extractor split the parts by slashes.  If user specifies the output 
dateformat which contains slashes, that is going to fail the extraction.

The fix is to introduce a new partition extractor so that we treat the 
partition as a whole when there is only a single partition column, instead of 
relying on `MultiPartKeysValueExtractor`.

 

 

> Meta sync throws exception if TimestampBasedKeyGenerator is used to generate 
> partition path containing slashes
> --------------------------------------------------------------------------------------------------------------
>
>                 Key: HUDI-4966
>                 URL: https://issues.apache.org/jira/browse/HUDI-4966
>             Project: Apache Hudi
>          Issue Type: Bug
>            Reporter: Ethan Guo
>            Assignee: Ethan Guo
>            Priority: Critical
>             Fix For: 0.12.1
>
>
> For Deltastreamer, when using TimestampBasedKeyGenerator with the output 
> format of partition path containing slashes, e.g., "yyyy/MM/dd", and 
> hive-style partitioning disabled (by default), the meta sync fails.
> {code:java}
> --hoodie-conf hoodie.datasource.write.partitionpath.field=createdDate
> --hoodie-conf 
> hoodie.datasource.write.keygenerator.class=org.apache.hudi.keygen.TimestampBasedKeyGenerator
> --hoodie-conf hoodie.deltastreamer.keygen.timebased.timezone=GMT
> --hoodie-conf 
> hoodie.deltastreamer.keygen.timebased.output.dateformat=yyyy/MM/dd
> --hoodie-conf 
> hoodie.deltastreamer.keygen.timebased.timestamp.type=EPOCHMILLISECONDS {code}
> Hive Sync exception:
> {code:java}
> Exception in thread "main" org.apache.hudi.exception.HoodieException: Could 
> not sync using the meta sync class org.apache.hudi.hive.HiveSyncTool
>     at 
> org.apache.hudi.sync.common.util.SyncUtilHelpers.runHoodieMetaSync(SyncUtilHelpers.java:58)
>     at 
> org.apache.hudi.utilities.deltastreamer.DeltaSync.runMetaSync(DeltaSync.java:719)
>     at 
> org.apache.hudi.utilities.deltastreamer.DeltaSync.writeToSink(DeltaSync.java:637)
>     at 
> org.apache.hudi.utilities.deltastreamer.DeltaSync.syncOnce(DeltaSync.java:337)
>     at 
> org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.lambda$sync$2(HoodieDeltaStreamer.java:204)
>     at org.apache.hudi.common.util.Option.ifPresent(Option.java:97)
>     at 
> org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.sync(HoodieDeltaStreamer.java:202)
>     at 
> org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.main(HoodieDeltaStreamer.java:571)
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>     at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>     at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>     at java.lang.reflect.Method.invoke(Method.java:498)
>     at 
> org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
>     at 
> org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:955)
>     at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
>     at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
>     at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
>     at 
> org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1043)
>     at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1052)
>     at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> Caused by: org.apache.hudi.exception.HoodieException: Got runtime exception 
> when hive syncing test_table
>     at 
> org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:145)
>     at 
> org.apache.hudi.sync.common.util.SyncUtilHelpers.runHoodieMetaSync(SyncUtilHelpers.java:56)
>     ... 19 more
> Caused by: org.apache.hudi.hive.HoodieHiveSyncException: Failed to sync 
> partitions for table test_table
>     at org.apache.hudi.hive.HiveSyncTool.syncPartitions(HiveSyncTool.java:341)
>     at 
> org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:232)
>     at org.apache.hudi.hive.HiveSyncTool.doSync(HiveSyncTool.java:154)
>     at 
> org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:142)
>     ... 20 more
> Caused by: org.apache.hudi.hive.HoodieHiveSyncException: default.test_table 
> add partition failed
>     at 
> org.apache.hudi.hive.ddl.HMSDDLExecutor.addPartitionsToTable(HMSDDLExecutor.java:217)
>     at 
> org.apache.hudi.hive.HoodieHiveSyncClient.addPartitionsToTable(HoodieHiveSyncClient.java:107)
>     at org.apache.hudi.hive.HiveSyncTool.syncPartitions(HiveSyncTool.java:324)
>     ... 23 more
> Caused by: MetaException(message:Invalid partition key & values; keys 
> [createddate, ], values [2022, 10, 02, ])
>     at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$add_partitions_req_result$add_partitions_req_resultStandardScheme.read(ThriftHiveMetastore.java)
>     at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$add_partitions_req_result$add_partitions_req_resultStandardScheme.read(ThriftHiveMetastore.java)
>     at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$add_partitions_req_result.read(ThriftHiveMetastore.java)
>     at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:88)
>     at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_add_partitions_req(ThriftHiveMetastore.java:1911)
>     at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.add_partitions_req(ThriftHiveMetastore.java:1898)
>     at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.add_partitions(HiveMetaStoreClient.java:625)
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>     at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>     at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>     at java.lang.reflect.Method.invoke(Method.java:498)
>     at 
> org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:173)
>     at com.sun.proxy.$Proxy44.add_partitions(Unknown Source)
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>     at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>     at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>     at java.lang.reflect.Method.invoke(Method.java:498)
>     at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient$SynchronizedHandler.invoke(HiveMetaStoreClient.java:2327)
>     at com.sun.proxy.$Proxy44.add_partitions(Unknown Source)
>     at 
> org.apache.hudi.hive.ddl.HMSDDLExecutor.addPartitionsToTable(HMSDDLExecutor.java:212)
>     ... 25 more{code}
> Glue Sync exception:
> {code:java}
> Exception in thread "main" org.apache.hudi.exception.HoodieException: Could 
> not sync using the meta sync class 
> org.apache.hudi.aws.sync.AwsGlueCatalogSyncTool
>       at 
> org.apache.hudi.sync.common.util.SyncUtilHelpers.runHoodieMetaSync(SyncUtilHelpers.java:58)
>       at 
> org.apache.hudi.utilities.deltastreamer.DeltaSync.runMetaSync(DeltaSync.java:719)
>       at 
> org.apache.hudi.utilities.deltastreamer.DeltaSync.writeToSink(DeltaSync.java:637)
>       at 
> org.apache.hudi.utilities.deltastreamer.DeltaSync.syncOnce(DeltaSync.java:337)
>       at 
> org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.lambda$sync$2(HoodieDeltaStreamer.java:204)
>       at org.apache.hudi.common.util.Option.ifPresent(Option.java:97)
>       at 
> org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.sync(HoodieDeltaStreamer.java:202)
>       at 
> org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.main(HoodieDeltaStreamer.java:571)
>       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>       at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>       at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>       at java.lang.reflect.Method.invoke(Method.java:498)
>       at 
> org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
>       at 
> org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:955)
>       at 
> org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
>       at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
>       at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
>       at 
> org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1043)
>       at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1052)
>       at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> Caused by: org.apache.hudi.exception.HoodieException: Got runtime exception 
> when hive syncing test_table
>       at 
> org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:145)
>       at 
> org.apache.hudi.sync.common.util.SyncUtilHelpers.runHoodieMetaSync(SyncUtilHelpers.java:56)
>       ... 19 more
> Caused by: org.apache.hudi.hive.HoodieHiveSyncException: Failed to sync 
> partitions for table test_table
>       at 
> org.apache.hudi.hive.HiveSyncTool.syncPartitions(HiveSyncTool.java:341)
>       at 
> org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:232)
>       at org.apache.hudi.hive.HiveSyncTool.doSync(HiveSyncTool.java:154)
>       at 
> org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:142)
>       ... 20 more
> Caused by: org.apache.hudi.aws.sync.HoodieGlueSyncException: Fail to add 
> partitions to default.test_table
>       at 
> org.apache.hudi.aws.sync.AWSGlueCatalogSyncClient.addPartitionsToTable(AWSGlueCatalogSyncClient.java:147)
>       at 
> org.apache.hudi.hive.HiveSyncTool.syncPartitions(HiveSyncTool.java:324)
>       ... 23 more
> Caused by: 
> org.apache.hudi.com.amazonaws.services.glue.model.InvalidInputException: The 
> number of partition keys do not match the number of partition values 
> (Service: AWSGlue; Status Code: 400; Error Code: InvalidInputException; 
> Request ID: e8d9adf2-13c4-4589-bbec-c578a827749f; Proxy: null)
>       at 
> org.apache.hudi.com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleErrorResponse(AmazonHttpClient.java:1819)
>       at 
> org.apache.hudi.com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleServiceErrorResponse(AmazonHttpClient.java:1403)
>       at 
> org.apache.hudi.com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1372)
>       at 
> org.apache.hudi.com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1145)
>       at 
> org.apache.hudi.com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:802)
>       at 
> org.apache.hudi.com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:770)
>       at 
> org.apache.hudi.com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:744)
>       at 
> org.apache.hudi.com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:704)
>       at 
> org.apache.hudi.com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:686)
>       at 
> org.apache.hudi.com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:550)
>       at 
> org.apache.hudi.com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:530)
>       at 
> org.apache.hudi.com.amazonaws.services.glue.AWSGlueClient.doInvoke(AWSGlueClient.java:10640)
>       at 
> org.apache.hudi.com.amazonaws.services.glue.AWSGlueClient.invoke(AWSGlueClient.java:10607)
>       at 
> org.apache.hudi.com.amazonaws.services.glue.AWSGlueClient.invoke(AWSGlueClient.java:10596)
>       at 
> org.apache.hudi.com.amazonaws.services.glue.AWSGlueClient.executeBatchCreatePartition(AWSGlueClient.java:259)
>       at 
> org.apache.hudi.com.amazonaws.services.glue.AWSGlueClient.batchCreatePartition(AWSGlueClient.java:228)
>       at 
> org.apache.hudi.aws.sync.AWSGlueCatalogSyncClient.addPartitionsToTable(AWSGlueCatalogSyncClient.java:139)
>       ... 24 more {code}
> The exception is thrown because the partition path values for meta sync are 
> not properly extracted.  
> "hoodie.datasource.hive_sync.partition_extractor_class" determines the 
> partition extractor to use and in such a case, the 
> `MultiPartKeysValueExtractor` is inferred to be used.  The root cause is 
> that, this extractor split the parts by slashes.  If user specifies the 
> output dateformat which contains slashes, that is going to fail the 
> extraction.
> The fix is to introduce a new partition extractor so that we treat the 
> partition as a whole when there is only a single partition column, instead of 
> relying on `MultiPartKeysValueExtractor`.
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to