junmuz opened a new issue, #7646: URL: https://github.com/apache/paimon/issues/7646
### Search before asking - [x] I searched in the [issues](https://github.com/apache/paimon/issues) and found nothing similar. ### Motivation Paimon registers commit metrics (commit duration, files added/deleted, records appended, partitions written, etc.) through its internal MetricRegistry / MetricGroupImpl, and these are surfaced in the Spark UI via CustomTaskMetric. However, they are not accessible to external monitoring systems like Prometheus, Graphite, or any JMX-based scraper. This means that in production environments where Prometheus or other monitoring infrastructure is used to observe Spark applications, Paimon table-level commit metrics are invisible — operators have no way to alert on or dashboard commit duration, write throughput, partition counts, etc. without parsing Spark UI pages or logs. ### Solution Bridge Paimon's internal metrics into a Codahale MetricRegistry exposed as a Spark Source, with a dedicated JmxReporter to ensure MBeans are registered immediately. This requires three components: 1. SparkMetricGroup — A subclass of MetricGroupImpl that overrides counter(), gauge(), and histogram() to dual-register each metric: once in Paimon's internal map (preserving Spark UI integration) and once as a Codahale gauge in a shared MetricRegistry. 2. PaimonMetricsSource — A singleton Spark Source (must live under org.apache.spark due to package-private visibility) that owns: - A Codahale MetricRegistry shared with all SparkMetricGroup instances - A JmxReporter started eagerly on that registry, so MBeans appear as soon as metrics are added — this sidesteps Spark's MetricsSystem limitation of snapshotting the registry only at registerSource() time 3. Wiring — SparkMetricRegistry.createMetricGroup() is updated to instantiate SparkMetricGroup (instead of plain MetricGroupImpl), passing in the PaimonMetricsSource singleton's Codahale registry. The V1 commit path (PaimonSparkWriter.commit()) is also wired with withMetricRegistry() so commit metrics flow through the same path. The result is that all Paimon metrics appear under the paimon JMX domain (e.g. paimon.<table>.<group>.<metric>) and are scrapeable by jmx_prometheus_javaagent or any other JMX-based monitoring tool, with zero impact on existing Spark UI metrics. ### Anything else? N/A ### Are you willing to submit a PR? - [x] I'm willing to submit a PR! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
