Akira Ajisaka created HUDI-7107:
-----------------------------------

             Summary: Reused MetricsReporter fails to publish metrics in Spark 
streaming job
                 Key: HUDI-7107
                 URL: https://issues.apache.org/jira/browse/HUDI-7107
             Project: Apache Hudi
          Issue Type: Bug
          Components: metrics
            Reporter: Akira Ajisaka


A customer runs AWS Glue 4.0 streaming job (based on Spark 3.3.0) using Apache 
Hudi 0.14.0 libraries. The customer enabled [Hudi CW metrics 
reporter|https://hudi.apache.org/docs/metrics/#aws-cloudwatchreporter] in the 
job. It succeeded to publish metrics at first batch, however, after that it 
started to fail. Therefore there’s only one data sample published to CloudWatch 
metrics. The error stacktrace is as follows:
{noformat}
2023-11-09 15:59:17,775 ERROR [stream execution thread for [id = 
d31c62e2-e697-40b7-b6da-854cf9a8cb14, runId = 
0f2325c6-83f4-4dfa-a849-3b3189423a9b]] cloudwatch.CloudWatchReporter 
(CloudWatchReporter.java:report(236)): Error reporting metrics to CloudWatch. 
The data in this CloudWatch request may have been discarded, and not made it to 
CloudWatch.
java.util.concurrent.ExecutionException: 
org.apache.hudi.software.amazon.awssdk.core.exception.SdkClientException: 
Unable to execute HTTP request: event executor terminated
    at 
java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357) 
~[?:1.8.0_382]
    at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1928) 
~[?:1.8.0_382]
    at 
org.apache.hudi.aws.cloudwatch.CloudWatchReporter.report(CloudWatchReporter.java:234)
 ~[hudi-aws-bundle-0.14.0.jar:0.14.0]
    at 
org.apache.hudi.aws.cloudwatch.CloudWatchReporter.report(CloudWatchReporter.java:211)
 ~[hudi-aws-bundle-0.14.0.jar:0.14.0]
    at 
org.apache.hudi.com.codahale.metrics.ScheduledReporter.report(ScheduledReporter.java:237)
 ~[hudi-aws-bundle-0.14.0.jar:0.14.0]
    at 
org.apache.hudi.metrics.cloudwatch.CloudWatchMetricsReporter.report(CloudWatchMetricsReporter.java:71)
 ~[hudi-utilities-bundle_2.12-0.14.0.jar:0.14.0]
    at java.util.ArrayList.forEach(ArrayList.java:1259) ~[?:1.8.0_382]
    at org.apache.hudi.metrics.Metrics.shutdown(Metrics.java:116) 
~[hudi-utilities-bundle_2.12-0.14.0.jar:0.14.0]
    at java.util.HashMap$Values.forEach(HashMap.java:982) ~[?:1.8.0_382]
    at org.apache.hudi.metrics.Metrics.shutdownAllMetrics(Metrics.java:88) 
~[hudi-utilities-bundle_2.12-0.14.0.jar:0.14.0]
    at 
org.apache.hudi.HoodieSparkSqlWriter$.cleanup(HoodieSparkSqlWriter.scala:937) 
~[hudi-utilities-bundle_2.12-0.14.0.jar:0.14.0]
    at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:151) 
~[hudi-utilities-bundle_2.12-0.14.0.jar:0.14.0] {noformat}
This error comes from AWS Java SDK v2:
{noformat}
Caused by: java.util.concurrent.RejectedExecutionException: event executor 
terminated
 at 
io.netty.util.concurrent.SingleThreadEventExecutor.reject(SingleThreadEventExecutor.java:923)
 ~[netty-common-4.1.74.Final.jar:4.1.74.Final]
 at 
io.netty.util.concurrent.SingleThreadEventExecutor.offerTask(SingleThreadEventExecutor.java:350)
 ~[netty-common-4.1.74.Final.jar:4.1.74.Final]
 at 
io.netty.util.concurrent.SingleThreadEventExecutor.addTask(SingleThreadEventExecutor.java:343)
 ~[netty-common-4.1.74.Final.jar:4.1.74.Final]
 at 
io.netty.util.concurrent.SingleThreadEventExecutor.execute(SingleThreadEventExecutor.java:825)
 ~[netty-common-4.1.74.Final.jar:4.1.74.Final]
 at 
io.netty.util.concurrent.SingleThreadEventExecutor.lazyExecute(SingleThreadEventExecutor.java:820)
 ~[netty-common-4.1.74.Final.jar:4.1.74.Final]
 at 
io.netty.util.concurrent.AbstractScheduledEventExecutor.schedule(AbstractScheduledEventExecutor.java:263)
 ~[netty-common-4.1.74.Final.jar:4.1.74.Final]
 at 
io.netty.util.concurrent.AbstractScheduledEventExecutor.schedule(AbstractScheduledEventExecutor.java:177)
 ~[netty-common-4.1.74.Final.jar:4.1.74.Final]
 at 
io.netty.util.concurrent.AbstractEventExecutorGroup.schedule(AbstractEventExecutorGroup.java:50)
 ~[netty-common-4.1.74.Final.jar:4.1.74.Final]
 at 
org.apache.hudi.software.amazon.awssdk.http.nio.netty.internal.DelegatingEventLoopGroup.schedule(DelegatingEventLoopGroup.java:153)
 ~[hudi-aws-bundle-0.14.0.jar:0.14.0]
{noformat}
I've observed the MetricsReporter is shutdown after the 1st batch, however, the 
MetricsReporter instance is reused in the subsequent batches and it fails to 
report metrics.

Given MetricsReporter is not implemented as reusable, we should avoid reusing 
them.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to