Shawn Chang created HUDI-5183:
---------------------------------
Summary: Cloudwatch mertics won't work for CLI
Key: HUDI-5183
URL: https://issues.apache.org/jira/browse/HUDI-5183
Project: Apache Hudi
Issue Type: Bug
Reporter: Shawn Chang
This appears to be broken since this commit:
[https://github.com/apache/hudi/commit/9797fdfbb27ca8f5f06875ad958b597becc27a8d].
The commit makes metrics prefix configurable, changing it from the earlier way
of using the table name directly. For metadata table, while publishing the
metrics, this turns up as empty because the inference mechanism does not find
table name in the HoodieConfig it is trying to lookup.
This breaks while publishing metrics to cloudwatch, since it cannot publish a
Dimension with empty value.
Exception:
{code:java}
22/03/07 22:25:07 ERROR CloudWatchReporter: Error reporting metrics to
CloudWatch. The data in this CloudWatch request may have been discarded, and
not made it to CloudWatch.
java.util.concurrent.ExecutionException:
com.amazonaws.services.cloudwatch.model.MissingRequiredParameterException: The
parameter MetricData.member.1.Dimensions.member.1.Value is required. (Service:
AmazonCloudWatch; Status Code: 400; Error Code: MissingParameter; Request ID:
9b7c60fe-c872-4b92-8518-02bbe5e5e5b9; Proxy: null)
at java.util.concurrent.FutureTask.report(FutureTask.java:122)
~[?:1.8.0_322]
at java.util.concurrent.FutureTask.get(FutureTask.java:206) ~[?:1.8.0_322]
at
org.apache.hudi.aws.cloudwatch.CloudWatchReporter.report(CloudWatchReporter.java:234)
~[hudi-spark3-bundle_2.12-0.10.1-amzn-0-SNAPSHOT.jar:0.10.1-amzn-0-SNAPSHOT]
at
org.apache.hudi.aws.cloudwatch.CloudWatchReporter.report(CloudWatchReporter.java:212)
~[hudi-spark3-bundle_2.12-0.10.1-amzn-0-SNAPSHOT.jar:0.10.1-amzn-0-SNAPSHOT]
at
org.apache.hudi.com.codahale.metrics.ScheduledReporter.report(ScheduledReporter.java:237)
~[hudi-spark3-bundle_2.12-0.10.1-amzn-0-SNAPSHOT.jar:0.10.1-amzn-0-SNAPSHOT]
at
org.apache.hudi.com.codahale.metrics.ScheduledReporter.lambda$start$0(ScheduledReporter.java:177)
~[hudi-spark3-bundle_2.12-0.10.1-amzn-0-SNAPSHOT.jar:0.10.1-amzn-0-SNAPSHOT]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
[?:1.8.0_322]
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
[?:1.8.0_322]
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
[?:1.8.0_322]
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
[?:1.8.0_322]
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
[?:1.8.0_322]
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
[?:1.8.0_322]
at java.lang.Thread.run(Thread.java:750) [?:1.8.0_322]
Caused by:
com.amazonaws.services.cloudwatch.model.MissingRequiredParameterException: The
parameter MetricData.member.1.Dimensions.member.1.Value is required. (Service:
AmazonCloudWatch; Status Code: 400; Error Code: MissingParameter; Request ID:
9b7c60fe-c872-4b92-8518-02bbe5e5e5b9; Proxy: null)
at
com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleErrorResponse(AmazonHttpClient.java:1862)
~[aws-java-sdk-bundle-1.12.31.jar:?]
at
com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleServiceErrorResponse(AmazonHttpClient.java:1415)
~[aws-java-sdk-bundle-1.12.31.jar:?]
at
com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1384)
~[aws-java-sdk-bundle-1.12.31.jar:?]
at
com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1154)
~[aws-java-sdk-bundle-1.12.31.jar:?]
at
com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:811)
~[aws-java-sdk-bundle-1.12.31.jar:?]
at
com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:779)
~[aws-java-sdk-bundle-1.12.31.jar:?]
at
com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:753)
~[aws-java-sdk-bundle-1.12.31.jar:?]
at
com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:713)
~[aws-java-sdk-bundle-1.12.31.jar:?]
at
com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:695)
~[aws-java-sdk-bundle-1.12.31.jar:?]
at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:559)
~[aws-java-sdk-bundle-1.12.31.jar:?]
at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:539)
~[aws-java-sdk-bundle-1.12.31.jar:?]
at
com.amazonaws.services.cloudwatch.AmazonCloudWatchClient.doInvoke(AmazonCloudWatchClient.java:3084)
~[aws-java-sdk-bundle-1.12.31.jar:?]
at
com.amazonaws.services.cloudwatch.AmazonCloudWatchClient.invoke(AmazonCloudWatchClient.java:3051)
~[aws-java-sdk-bundle-1.12.31.jar:?]
at
com.amazonaws.services.cloudwatch.AmazonCloudWatchClient.invoke(AmazonCloudWatchClient.java:3040)
~[aws-java-sdk-bundle-1.12.31.jar:?]
at
com.amazonaws.services.cloudwatch.AmazonCloudWatchClient.executePutMetricData(AmazonCloudWatchClient.java:2559)
~[aws-java-sdk-bundle-1.12.31.jar:?]
at
com.amazonaws.services.cloudwatch.AmazonCloudWatchAsyncClient$30.call(AmazonCloudWatchAsyncClient.java:1314)
~[aws-java-sdk-bundle-1.12.31.jar:?]
at
com.amazonaws.services.cloudwatch.AmazonCloudWatchAsyncClient$30.call(AmazonCloudWatchAsyncClient.java:1308)
~[aws-java-sdk-bundle-1.12.31.jar:?]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[?:1.8.0_322]
... 3 more {code}
Reproduction:
# Create a hudi table, any table works
# Start hudi-cli, with metrics turned on: hoodie.metrics.on=true and metrics
reporter type set to cloudwatch (hoodie.metrics.reporter.type=CLOUDWATCH)
# Run commands like `cleans run` or `cluster schedule`
Thoughts:
- It would be easy to unblock users by passing table name to sparkArgs in
classes like `CleansCommand`. But it would be more ideal if we can have
`HoodieWriteConfig` set default values such as table name by referring to
`HoodieTableConfig` automatically
--
This message was sent by Atlassian Jira
(v8.20.10#820010)