majian1998 opened a new pull request, #10010: URL: https://github.com/apache/hudi/pull/10010
In addition to real-time monitoring metrics, Hudi also has some result metrics, such as IO for clustering reads and writes. These metrics are meaningful for continuously observing the table service status. However, the existing metrics reporter either outputs to the console or memory without persistence, or it outputs to another metrics server, requiring complex environment setup. We hope to provide a simple persistent reporter where users can specify that the metrics be stored in the file system in JSON format. Ideally, we planned to update the latest version of metrics to the file system by calling shutdown through a shutdown hook when finishing. However, at that point, the Hudi file system has already closed the connection pool, making it impossible to write to the file. Therefore, we update the file by actively calling the shutdown function when finishing. Currently, in HoodieSparkSqlWriter.cleanup(), the shutdown function is actively called, which means metrics are reported at the end of the write process. By doing the same in the table service, we can achieve the same effect. ### Change Logs Provides a file system-based metrics reporter ### Impact Some parameters related to the reporter: For example, in hoodie.metrics.reporter.type, FILESYSTEM has been added. And FILESYSTEM specifies the address, naming, and whether to enable scheduled writing of the metrics file. ### Risk level (write none, low medium or high below) LOW ### Documentation Update Metrics report type supports FILESYSTEM Updated parameters: hoodie.metrics.reporter.type, FILESYSTEM has been added. New parameters: hoodie.metrics.filesystem.reporter.path - The path for persisting Hudi storage metrics files. hoodie.metrics.filesystem.metric.prefix - The prefix for Hudi storage metrics persistence file names. hoodie.metrics.filesystem.overwrite.file - Whether to override the same metrics file for the same table. hoodie.metrics.filesystem.schedule.enable - Whether to enable scheduled output of metrics to the file system. Default is off, only need to output the final result to the file system. hoodie.metrics.filesystem.report.period.seconds - File system reporting period in seconds. Default to 60. ### Contributor's checklist - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [ ] Change Logs and Impact were stated clearly - [ ] Adequate tests were added if applicable - [ ] CI passed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
