[
https://issues.apache.org/jira/browse/HUDI-7107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Danny Chen closed HUDI-7107.
----------------------------
Resolution: Fixed
Fixed via master branch: eaba1146afc83e5e70ef520704a76a15a75c9aad
> Reused MetricsReporter fails to publish metrics in Spark streaming job
> ----------------------------------------------------------------------
>
> Key: HUDI-7107
> URL: https://issues.apache.org/jira/browse/HUDI-7107
> Project: Apache Hudi
> Issue Type: Bug
> Components: metrics
> Reporter: Akira Ajisaka
> Priority: Major
> Labels: pull-request-available
> Fix For: 0.14.1
>
>
> A customer runs AWS Glue 4.0 streaming job (based on Spark 3.3.0) using
> Apache Hudi 0.14.0 libraries. The customer enabled [Hudi CW metrics
> reporter|https://hudi.apache.org/docs/metrics/#aws-cloudwatchreporter] in the
> job. It succeeded to publish metrics at first batch, however, after that it
> started to fail. Therefore there’s only one data sample published to
> CloudWatch metrics. The error stacktrace is as follows:
> {noformat}
> 2023-11-09 15:59:17,775 ERROR [stream execution thread for [id =
> d31c62e2-e697-40b7-b6da-854cf9a8cb14, runId =
> 0f2325c6-83f4-4dfa-a849-3b3189423a9b]] cloudwatch.CloudWatchReporter
> (CloudWatchReporter.java:report(236)): Error reporting metrics to CloudWatch.
> The data in this CloudWatch request may have been discarded, and not made it
> to CloudWatch.
> java.util.concurrent.ExecutionException:
> org.apache.hudi.software.amazon.awssdk.core.exception.SdkClientException:
> Unable to execute HTTP request: event executor terminated
> at
> java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
> ~[?:1.8.0_382]
> at
> java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1928)
> ~[?:1.8.0_382]
> at
> org.apache.hudi.aws.cloudwatch.CloudWatchReporter.report(CloudWatchReporter.java:234)
> ~[hudi-aws-bundle-0.14.0.jar:0.14.0]
> at
> org.apache.hudi.aws.cloudwatch.CloudWatchReporter.report(CloudWatchReporter.java:211)
> ~[hudi-aws-bundle-0.14.0.jar:0.14.0]
> at
> org.apache.hudi.com.codahale.metrics.ScheduledReporter.report(ScheduledReporter.java:237)
> ~[hudi-aws-bundle-0.14.0.jar:0.14.0]
> at
> org.apache.hudi.metrics.cloudwatch.CloudWatchMetricsReporter.report(CloudWatchMetricsReporter.java:71)
> ~[hudi-utilities-bundle_2.12-0.14.0.jar:0.14.0]
> at java.util.ArrayList.forEach(ArrayList.java:1259) ~[?:1.8.0_382]
> at org.apache.hudi.metrics.Metrics.shutdown(Metrics.java:116)
> ~[hudi-utilities-bundle_2.12-0.14.0.jar:0.14.0]
> at java.util.HashMap$Values.forEach(HashMap.java:982) ~[?:1.8.0_382]
> at org.apache.hudi.metrics.Metrics.shutdownAllMetrics(Metrics.java:88)
> ~[hudi-utilities-bundle_2.12-0.14.0.jar:0.14.0]
> at
> org.apache.hudi.HoodieSparkSqlWriter$.cleanup(HoodieSparkSqlWriter.scala:937)
> ~[hudi-utilities-bundle_2.12-0.14.0.jar:0.14.0]
> at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:151)
> ~[hudi-utilities-bundle_2.12-0.14.0.jar:0.14.0] {noformat}
> This error comes from AWS Java SDK v2:
> {noformat}
> Caused by: java.util.concurrent.RejectedExecutionException: event executor
> terminated
> at
> io.netty.util.concurrent.SingleThreadEventExecutor.reject(SingleThreadEventExecutor.java:923)
> ~[netty-common-4.1.74.Final.jar:4.1.74.Final]
> at
> io.netty.util.concurrent.SingleThreadEventExecutor.offerTask(SingleThreadEventExecutor.java:350)
> ~[netty-common-4.1.74.Final.jar:4.1.74.Final]
> at
> io.netty.util.concurrent.SingleThreadEventExecutor.addTask(SingleThreadEventExecutor.java:343)
> ~[netty-common-4.1.74.Final.jar:4.1.74.Final]
> at
> io.netty.util.concurrent.SingleThreadEventExecutor.execute(SingleThreadEventExecutor.java:825)
> ~[netty-common-4.1.74.Final.jar:4.1.74.Final]
> at
> io.netty.util.concurrent.SingleThreadEventExecutor.lazyExecute(SingleThreadEventExecutor.java:820)
> ~[netty-common-4.1.74.Final.jar:4.1.74.Final]
> at
> io.netty.util.concurrent.AbstractScheduledEventExecutor.schedule(AbstractScheduledEventExecutor.java:263)
> ~[netty-common-4.1.74.Final.jar:4.1.74.Final]
> at
> io.netty.util.concurrent.AbstractScheduledEventExecutor.schedule(AbstractScheduledEventExecutor.java:177)
> ~[netty-common-4.1.74.Final.jar:4.1.74.Final]
> at
> io.netty.util.concurrent.AbstractEventExecutorGroup.schedule(AbstractEventExecutorGroup.java:50)
> ~[netty-common-4.1.74.Final.jar:4.1.74.Final]
> at
> org.apache.hudi.software.amazon.awssdk.http.nio.netty.internal.DelegatingEventLoopGroup.schedule(DelegatingEventLoopGroup.java:153)
> ~[hudi-aws-bundle-0.14.0.jar:0.14.0]
> {noformat}
> I've observed the MetricsReporter is shutdown after the 1st batch, however,
> the MetricsReporter instance is reused in the subsequent batches and it fails
> to report metrics.
> Given MetricsReporter is not implemented as reusable, we should avoid reusing
> them.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)