[
https://issues.apache.org/jira/browse/HUDI-4966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ethan Guo updated HUDI-4966:
----------------------------
Description:
For Deltastreamer, when using TimestampBasedKeyGenerator with the output format
of partition path containing slashes, e.g., "yyyy/MM/dd", and hive-style
partitioning disabled (by default), the meta sync fails.
{code:java}
--hoodie-conf hoodie.datasource.write.partitionpath.field=createdDate
--hoodie-conf
hoodie.datasource.write.keygenerator.class=org.apache.hudi.keygen.TimestampBasedKeyGenerator
--hoodie-conf hoodie.deltastreamer.keygen.timebased.timezone=GMT
--hoodie-conf hoodie.deltastreamer.keygen.timebased.output.dateformat=yyyy/MM/dd
--hoodie-conf
hoodie.deltastreamer.keygen.timebased.timestamp.type=EPOCHMILLISECONDS {code}
Hive Sync exception:
{code:java}
Exception in thread "main" org.apache.hudi.exception.HoodieException: Could not
sync using the meta sync class org.apache.hudi.hive.HiveSyncTool
at
org.apache.hudi.sync.common.util.SyncUtilHelpers.runHoodieMetaSync(SyncUtilHelpers.java:58)
at
org.apache.hudi.utilities.deltastreamer.DeltaSync.runMetaSync(DeltaSync.java:719)
at
org.apache.hudi.utilities.deltastreamer.DeltaSync.writeToSink(DeltaSync.java:637)
at
org.apache.hudi.utilities.deltastreamer.DeltaSync.syncOnce(DeltaSync.java:337)
at
org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.lambda$sync$2(HoodieDeltaStreamer.java:204)
at org.apache.hudi.common.util.Option.ifPresent(Option.java:97)
at
org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.sync(HoodieDeltaStreamer.java:202)
at
org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.main(HoodieDeltaStreamer.java:571)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at
org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
at
org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:955)
at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
at
org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1043)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1052)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: org.apache.hudi.exception.HoodieException: Got runtime exception
when hive syncing test_table
at org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:145)
at
org.apache.hudi.sync.common.util.SyncUtilHelpers.runHoodieMetaSync(SyncUtilHelpers.java:56)
... 19 more
Caused by: org.apache.hudi.hive.HoodieHiveSyncException: Failed to sync
partitions for table test_table
at org.apache.hudi.hive.HiveSyncTool.syncPartitions(HiveSyncTool.java:341)
at org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:232)
at org.apache.hudi.hive.HiveSyncTool.doSync(HiveSyncTool.java:154)
at org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:142)
... 20 more
Caused by: org.apache.hudi.hive.HoodieHiveSyncException: default.test_table add
partition failed
at
org.apache.hudi.hive.ddl.HMSDDLExecutor.addPartitionsToTable(HMSDDLExecutor.java:217)
at
org.apache.hudi.hive.HoodieHiveSyncClient.addPartitionsToTable(HoodieHiveSyncClient.java:107)
at org.apache.hudi.hive.HiveSyncTool.syncPartitions(HiveSyncTool.java:324)
... 23 more
Caused by: MetaException(message:Invalid partition key & values; keys
[createddate, ], values [2022, 10, 02, ])
at
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$add_partitions_req_result$add_partitions_req_resultStandardScheme.read(ThriftHiveMetastore.java)
at
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$add_partitions_req_result$add_partitions_req_resultStandardScheme.read(ThriftHiveMetastore.java)
at
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$add_partitions_req_result.read(ThriftHiveMetastore.java)
at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:88)
at
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_add_partitions_req(ThriftHiveMetastore.java:1911)
at
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.add_partitions_req(ThriftHiveMetastore.java:1898)
at
org.apache.hadoop.hive.metastore.HiveMetaStoreClient.add_partitions(HiveMetaStoreClient.java:625)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at
org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:173)
at com.sun.proxy.$Proxy44.add_partitions(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at
org.apache.hadoop.hive.metastore.HiveMetaStoreClient$SynchronizedHandler.invoke(HiveMetaStoreClient.java:2327)
at com.sun.proxy.$Proxy44.add_partitions(Unknown Source)
at
org.apache.hudi.hive.ddl.HMSDDLExecutor.addPartitionsToTable(HMSDDLExecutor.java:212)
... 25 more{code}
Glue Sync exception:
{code:java}
Exception in thread "main" org.apache.hudi.exception.HoodieException: Could not
sync using the meta sync class org.apache.hudi.aws.sync.AwsGlueCatalogSyncTool
at
org.apache.hudi.sync.common.util.SyncUtilHelpers.runHoodieMetaSync(SyncUtilHelpers.java:58)
at
org.apache.hudi.utilities.deltastreamer.DeltaSync.runMetaSync(DeltaSync.java:719)
at
org.apache.hudi.utilities.deltastreamer.DeltaSync.writeToSink(DeltaSync.java:637)
at
org.apache.hudi.utilities.deltastreamer.DeltaSync.syncOnce(DeltaSync.java:337)
at
org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.lambda$sync$2(HoodieDeltaStreamer.java:204)
at org.apache.hudi.common.util.Option.ifPresent(Option.java:97)
at
org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.sync(HoodieDeltaStreamer.java:202)
at
org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.main(HoodieDeltaStreamer.java:571)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at
org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
at
org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:955)
at
org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
at
org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1043)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1052)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: org.apache.hudi.exception.HoodieException: Got runtime exception
when hive syncing test_table
at
org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:145)
at
org.apache.hudi.sync.common.util.SyncUtilHelpers.runHoodieMetaSync(SyncUtilHelpers.java:56)
... 19 more
Caused by: org.apache.hudi.hive.HoodieHiveSyncException: Failed to sync
partitions for table test_table
at
org.apache.hudi.hive.HiveSyncTool.syncPartitions(HiveSyncTool.java:341)
at
org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:232)
at org.apache.hudi.hive.HiveSyncTool.doSync(HiveSyncTool.java:154)
at
org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:142)
... 20 more
Caused by: org.apache.hudi.aws.sync.HoodieGlueSyncException: Fail to add
partitions to default.test_table
at
org.apache.hudi.aws.sync.AWSGlueCatalogSyncClient.addPartitionsToTable(AWSGlueCatalogSyncClient.java:147)
at
org.apache.hudi.hive.HiveSyncTool.syncPartitions(HiveSyncTool.java:324)
... 23 more
Caused by:
org.apache.hudi.com.amazonaws.services.glue.model.InvalidInputException: The
number of partition keys do not match the number of partition values (Service:
AWSGlue; Status Code: 400; Error Code: InvalidInputException; Request ID:
e8d9adf2-13c4-4589-bbec-c578a827749f; Proxy: null)
at
org.apache.hudi.com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleErrorResponse(AmazonHttpClient.java:1819)
at
org.apache.hudi.com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleServiceErrorResponse(AmazonHttpClient.java:1403)
at
org.apache.hudi.com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1372)
at
org.apache.hudi.com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1145)
at
org.apache.hudi.com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:802)
at
org.apache.hudi.com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:770)
at
org.apache.hudi.com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:744)
at
org.apache.hudi.com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:704)
at
org.apache.hudi.com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:686)
at
org.apache.hudi.com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:550)
at
org.apache.hudi.com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:530)
at
org.apache.hudi.com.amazonaws.services.glue.AWSGlueClient.doInvoke(AWSGlueClient.java:10640)
at
org.apache.hudi.com.amazonaws.services.glue.AWSGlueClient.invoke(AWSGlueClient.java:10607)
at
org.apache.hudi.com.amazonaws.services.glue.AWSGlueClient.invoke(AWSGlueClient.java:10596)
at
org.apache.hudi.com.amazonaws.services.glue.AWSGlueClient.executeBatchCreatePartition(AWSGlueClient.java:259)
at
org.apache.hudi.com.amazonaws.services.glue.AWSGlueClient.batchCreatePartition(AWSGlueClient.java:228)
at
org.apache.hudi.aws.sync.AWSGlueCatalogSyncClient.addPartitionsToTable(AWSGlueCatalogSyncClient.java:139)
... 24 more {code}
The exception is thrown because the partition path values for meta sync are not
properly extracted. "hoodie.datasource.hive_sync.partition_extractor_class"
determines the partition extractor to use and in such a case, the
`MultiPartKeysValueExtractor` is inferred to be used. The root cause is that,
this extractor split the parts by slashes. If user specifies the output
dateformat which contains slashes, that is going to fail the extraction.
The fix is to introduce a new partition extractor so that we treat the
partition as a whole when there is only a single partition column, instead of
relying on `MultiPartKeysValueExtractor`.
> Meta sync throws exception if TimestampBasedKeyGenerator is used to generate
> partition path containing slashes
> --------------------------------------------------------------------------------------------------------------
>
> Key: HUDI-4966
> URL: https://issues.apache.org/jira/browse/HUDI-4966
> Project: Apache Hudi
> Issue Type: Bug
> Reporter: Ethan Guo
> Assignee: Ethan Guo
> Priority: Critical
> Fix For: 0.12.1
>
>
> For Deltastreamer, when using TimestampBasedKeyGenerator with the output
> format of partition path containing slashes, e.g., "yyyy/MM/dd", and
> hive-style partitioning disabled (by default), the meta sync fails.
> {code:java}
> --hoodie-conf hoodie.datasource.write.partitionpath.field=createdDate
> --hoodie-conf
> hoodie.datasource.write.keygenerator.class=org.apache.hudi.keygen.TimestampBasedKeyGenerator
> --hoodie-conf hoodie.deltastreamer.keygen.timebased.timezone=GMT
> --hoodie-conf
> hoodie.deltastreamer.keygen.timebased.output.dateformat=yyyy/MM/dd
> --hoodie-conf
> hoodie.deltastreamer.keygen.timebased.timestamp.type=EPOCHMILLISECONDS {code}
> Hive Sync exception:
> {code:java}
> Exception in thread "main" org.apache.hudi.exception.HoodieException: Could
> not sync using the meta sync class org.apache.hudi.hive.HiveSyncTool
> at
> org.apache.hudi.sync.common.util.SyncUtilHelpers.runHoodieMetaSync(SyncUtilHelpers.java:58)
> at
> org.apache.hudi.utilities.deltastreamer.DeltaSync.runMetaSync(DeltaSync.java:719)
> at
> org.apache.hudi.utilities.deltastreamer.DeltaSync.writeToSink(DeltaSync.java:637)
> at
> org.apache.hudi.utilities.deltastreamer.DeltaSync.syncOnce(DeltaSync.java:337)
> at
> org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.lambda$sync$2(HoodieDeltaStreamer.java:204)
> at org.apache.hudi.common.util.Option.ifPresent(Option.java:97)
> at
> org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.sync(HoodieDeltaStreamer.java:202)
> at
> org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.main(HoodieDeltaStreamer.java:571)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at
> org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
> at
> org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:955)
> at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
> at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
> at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
> at
> org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1043)
> at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1052)
> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> Caused by: org.apache.hudi.exception.HoodieException: Got runtime exception
> when hive syncing test_table
> at
> org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:145)
> at
> org.apache.hudi.sync.common.util.SyncUtilHelpers.runHoodieMetaSync(SyncUtilHelpers.java:56)
> ... 19 more
> Caused by: org.apache.hudi.hive.HoodieHiveSyncException: Failed to sync
> partitions for table test_table
> at org.apache.hudi.hive.HiveSyncTool.syncPartitions(HiveSyncTool.java:341)
> at
> org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:232)
> at org.apache.hudi.hive.HiveSyncTool.doSync(HiveSyncTool.java:154)
> at
> org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:142)
> ... 20 more
> Caused by: org.apache.hudi.hive.HoodieHiveSyncException: default.test_table
> add partition failed
> at
> org.apache.hudi.hive.ddl.HMSDDLExecutor.addPartitionsToTable(HMSDDLExecutor.java:217)
> at
> org.apache.hudi.hive.HoodieHiveSyncClient.addPartitionsToTable(HoodieHiveSyncClient.java:107)
> at org.apache.hudi.hive.HiveSyncTool.syncPartitions(HiveSyncTool.java:324)
> ... 23 more
> Caused by: MetaException(message:Invalid partition key & values; keys
> [createddate, ], values [2022, 10, 02, ])
> at
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$add_partitions_req_result$add_partitions_req_resultStandardScheme.read(ThriftHiveMetastore.java)
> at
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$add_partitions_req_result$add_partitions_req_resultStandardScheme.read(ThriftHiveMetastore.java)
> at
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$add_partitions_req_result.read(ThriftHiveMetastore.java)
> at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:88)
> at
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_add_partitions_req(ThriftHiveMetastore.java:1911)
> at
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.add_partitions_req(ThriftHiveMetastore.java:1898)
> at
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.add_partitions(HiveMetaStoreClient.java:625)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at
> org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:173)
> at com.sun.proxy.$Proxy44.add_partitions(Unknown Source)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient$SynchronizedHandler.invoke(HiveMetaStoreClient.java:2327)
> at com.sun.proxy.$Proxy44.add_partitions(Unknown Source)
> at
> org.apache.hudi.hive.ddl.HMSDDLExecutor.addPartitionsToTable(HMSDDLExecutor.java:212)
> ... 25 more{code}
> Glue Sync exception:
> {code:java}
> Exception in thread "main" org.apache.hudi.exception.HoodieException: Could
> not sync using the meta sync class
> org.apache.hudi.aws.sync.AwsGlueCatalogSyncTool
> at
> org.apache.hudi.sync.common.util.SyncUtilHelpers.runHoodieMetaSync(SyncUtilHelpers.java:58)
> at
> org.apache.hudi.utilities.deltastreamer.DeltaSync.runMetaSync(DeltaSync.java:719)
> at
> org.apache.hudi.utilities.deltastreamer.DeltaSync.writeToSink(DeltaSync.java:637)
> at
> org.apache.hudi.utilities.deltastreamer.DeltaSync.syncOnce(DeltaSync.java:337)
> at
> org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.lambda$sync$2(HoodieDeltaStreamer.java:204)
> at org.apache.hudi.common.util.Option.ifPresent(Option.java:97)
> at
> org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.sync(HoodieDeltaStreamer.java:202)
> at
> org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.main(HoodieDeltaStreamer.java:571)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at
> org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
> at
> org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:955)
> at
> org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
> at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
> at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
> at
> org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1043)
> at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1052)
> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> Caused by: org.apache.hudi.exception.HoodieException: Got runtime exception
> when hive syncing test_table
> at
> org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:145)
> at
> org.apache.hudi.sync.common.util.SyncUtilHelpers.runHoodieMetaSync(SyncUtilHelpers.java:56)
> ... 19 more
> Caused by: org.apache.hudi.hive.HoodieHiveSyncException: Failed to sync
> partitions for table test_table
> at
> org.apache.hudi.hive.HiveSyncTool.syncPartitions(HiveSyncTool.java:341)
> at
> org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:232)
> at org.apache.hudi.hive.HiveSyncTool.doSync(HiveSyncTool.java:154)
> at
> org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:142)
> ... 20 more
> Caused by: org.apache.hudi.aws.sync.HoodieGlueSyncException: Fail to add
> partitions to default.test_table
> at
> org.apache.hudi.aws.sync.AWSGlueCatalogSyncClient.addPartitionsToTable(AWSGlueCatalogSyncClient.java:147)
> at
> org.apache.hudi.hive.HiveSyncTool.syncPartitions(HiveSyncTool.java:324)
> ... 23 more
> Caused by:
> org.apache.hudi.com.amazonaws.services.glue.model.InvalidInputException: The
> number of partition keys do not match the number of partition values
> (Service: AWSGlue; Status Code: 400; Error Code: InvalidInputException;
> Request ID: e8d9adf2-13c4-4589-bbec-c578a827749f; Proxy: null)
> at
> org.apache.hudi.com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleErrorResponse(AmazonHttpClient.java:1819)
> at
> org.apache.hudi.com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleServiceErrorResponse(AmazonHttpClient.java:1403)
> at
> org.apache.hudi.com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1372)
> at
> org.apache.hudi.com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1145)
> at
> org.apache.hudi.com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:802)
> at
> org.apache.hudi.com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:770)
> at
> org.apache.hudi.com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:744)
> at
> org.apache.hudi.com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:704)
> at
> org.apache.hudi.com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:686)
> at
> org.apache.hudi.com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:550)
> at
> org.apache.hudi.com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:530)
> at
> org.apache.hudi.com.amazonaws.services.glue.AWSGlueClient.doInvoke(AWSGlueClient.java:10640)
> at
> org.apache.hudi.com.amazonaws.services.glue.AWSGlueClient.invoke(AWSGlueClient.java:10607)
> at
> org.apache.hudi.com.amazonaws.services.glue.AWSGlueClient.invoke(AWSGlueClient.java:10596)
> at
> org.apache.hudi.com.amazonaws.services.glue.AWSGlueClient.executeBatchCreatePartition(AWSGlueClient.java:259)
> at
> org.apache.hudi.com.amazonaws.services.glue.AWSGlueClient.batchCreatePartition(AWSGlueClient.java:228)
> at
> org.apache.hudi.aws.sync.AWSGlueCatalogSyncClient.addPartitionsToTable(AWSGlueCatalogSyncClient.java:139)
> ... 24 more {code}
> The exception is thrown because the partition path values for meta sync are
> not properly extracted.
> "hoodie.datasource.hive_sync.partition_extractor_class" determines the
> partition extractor to use and in such a case, the
> `MultiPartKeysValueExtractor` is inferred to be used. The root cause is
> that, this extractor split the parts by slashes. If user specifies the
> output dateformat which contains slashes, that is going to fail the
> extraction.
> The fix is to introduce a new partition extractor so that we treat the
> partition as a whole when there is only a single partition column, instead of
> relying on `MultiPartKeysValueExtractor`.
>
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)