Shawn Chang created HUDI-5183:
---------------------------------

             Summary: Cloudwatch mertics won't work for CLI
                 Key: HUDI-5183
                 URL: https://issues.apache.org/jira/browse/HUDI-5183
             Project: Apache Hudi
          Issue Type: Bug
            Reporter: Shawn Chang


This appears to be broken since this commit: 
[https://github.com/apache/hudi/commit/9797fdfbb27ca8f5f06875ad958b597becc27a8d].
 The commit makes metrics prefix configurable, changing it from the earlier way 
of using the table name directly. For metadata table, while publishing the 
metrics, this turns up as empty because the inference mechanism does not find 
table name in the HoodieConfig it is trying to lookup.

 

This breaks while publishing metrics to cloudwatch, since it cannot publish a 
Dimension with empty value.

 

Exception:

 
{code:java}
22/03/07 22:25:07 ERROR CloudWatchReporter: Error reporting metrics to 
CloudWatch. The data in this CloudWatch request may have been discarded, and 
not made it to CloudWatch.
java.util.concurrent.ExecutionException: 
com.amazonaws.services.cloudwatch.model.MissingRequiredParameterException: The 
parameter MetricData.member.1.Dimensions.member.1.Value is required. (Service: 
AmazonCloudWatch; Status Code: 400; Error Code: MissingParameter; Request ID: 
9b7c60fe-c872-4b92-8518-02bbe5e5e5b9; Proxy: null)
    at java.util.concurrent.FutureTask.report(FutureTask.java:122) 
~[?:1.8.0_322]
    at java.util.concurrent.FutureTask.get(FutureTask.java:206) ~[?:1.8.0_322]
    at 
org.apache.hudi.aws.cloudwatch.CloudWatchReporter.report(CloudWatchReporter.java:234)
 ~[hudi-spark3-bundle_2.12-0.10.1-amzn-0-SNAPSHOT.jar:0.10.1-amzn-0-SNAPSHOT]
    at 
org.apache.hudi.aws.cloudwatch.CloudWatchReporter.report(CloudWatchReporter.java:212)
 ~[hudi-spark3-bundle_2.12-0.10.1-amzn-0-SNAPSHOT.jar:0.10.1-amzn-0-SNAPSHOT]
    at 
org.apache.hudi.com.codahale.metrics.ScheduledReporter.report(ScheduledReporter.java:237)
 ~[hudi-spark3-bundle_2.12-0.10.1-amzn-0-SNAPSHOT.jar:0.10.1-amzn-0-SNAPSHOT]
    at 
org.apache.hudi.com.codahale.metrics.ScheduledReporter.lambda$start$0(ScheduledReporter.java:177)
 ~[hudi-spark3-bundle_2.12-0.10.1-amzn-0-SNAPSHOT.jar:0.10.1-amzn-0-SNAPSHOT]
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
[?:1.8.0_322]
    at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) 
[?:1.8.0_322]
    at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
 [?:1.8.0_322]
    at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
 [?:1.8.0_322]
    at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
[?:1.8.0_322]
    at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
[?:1.8.0_322]
    at java.lang.Thread.run(Thread.java:750) [?:1.8.0_322]
Caused by: 
com.amazonaws.services.cloudwatch.model.MissingRequiredParameterException: The 
parameter MetricData.member.1.Dimensions.member.1.Value is required. (Service: 
AmazonCloudWatch; Status Code: 400; Error Code: MissingParameter; Request ID: 
9b7c60fe-c872-4b92-8518-02bbe5e5e5b9; Proxy: null)
    at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleErrorResponse(AmazonHttpClient.java:1862)
 ~[aws-java-sdk-bundle-1.12.31.jar:?]
    at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleServiceErrorResponse(AmazonHttpClient.java:1415)
 ~[aws-java-sdk-bundle-1.12.31.jar:?]
    at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1384)
 ~[aws-java-sdk-bundle-1.12.31.jar:?]
    at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1154)
 ~[aws-java-sdk-bundle-1.12.31.jar:?]
    at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:811)
 ~[aws-java-sdk-bundle-1.12.31.jar:?]
    at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:779)
 ~[aws-java-sdk-bundle-1.12.31.jar:?]
    at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:753)
 ~[aws-java-sdk-bundle-1.12.31.jar:?]
    at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:713)
 ~[aws-java-sdk-bundle-1.12.31.jar:?]
    at 
com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:695)
 ~[aws-java-sdk-bundle-1.12.31.jar:?]
    at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:559) 
~[aws-java-sdk-bundle-1.12.31.jar:?]
    at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:539) 
~[aws-java-sdk-bundle-1.12.31.jar:?]
    at 
com.amazonaws.services.cloudwatch.AmazonCloudWatchClient.doInvoke(AmazonCloudWatchClient.java:3084)
 ~[aws-java-sdk-bundle-1.12.31.jar:?]
    at 
com.amazonaws.services.cloudwatch.AmazonCloudWatchClient.invoke(AmazonCloudWatchClient.java:3051)
 ~[aws-java-sdk-bundle-1.12.31.jar:?]
    at 
com.amazonaws.services.cloudwatch.AmazonCloudWatchClient.invoke(AmazonCloudWatchClient.java:3040)
 ~[aws-java-sdk-bundle-1.12.31.jar:?]
    at 
com.amazonaws.services.cloudwatch.AmazonCloudWatchClient.executePutMetricData(AmazonCloudWatchClient.java:2559)
 ~[aws-java-sdk-bundle-1.12.31.jar:?]
    at 
com.amazonaws.services.cloudwatch.AmazonCloudWatchAsyncClient$30.call(AmazonCloudWatchAsyncClient.java:1314)
 ~[aws-java-sdk-bundle-1.12.31.jar:?]
    at 
com.amazonaws.services.cloudwatch.AmazonCloudWatchAsyncClient$30.call(AmazonCloudWatchAsyncClient.java:1308)
 ~[aws-java-sdk-bundle-1.12.31.jar:?]
    at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[?:1.8.0_322]
    ... 3 more {code}
 

 

 

Reproduction:
 # Create a hudi table, any table works
 # Start hudi-cli, with metrics turned on: hoodie.metrics.on=true and metrics 
reporter type set to cloudwatch (hoodie.metrics.reporter.type=CLOUDWATCH)
 # Run commands like `cleans run` or `cluster schedule`

 

Thoughts:

- It would be easy to unblock users by passing table name to sparkArgs in 
classes like `CleansCommand`. But it would be more ideal if we can have 
`HoodieWriteConfig` set default values such as table name by referring to 
`HoodieTableConfig` automatically



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to