majian1998 opened a new pull request, #10010:
URL: https://github.com/apache/hudi/pull/10010

   In addition to real-time monitoring metrics, Hudi also has some result 
metrics, such as IO for clustering reads and writes. These metrics are 
meaningful for continuously observing the table service status.
   However, the existing metrics reporter either outputs to the console or 
memory without persistence, or it outputs to another metrics server, requiring 
complex environment setup. We hope to provide a simple persistent reporter 
where users can specify that the metrics be stored in the file system in JSON 
format.
   Ideally, we planned to update the latest version of metrics to the file 
system by calling shutdown through a shutdown hook when finishing. However, at 
that point, the Hudi file system has already closed the connection pool, making 
it impossible to write to the file. Therefore, we update the file by actively 
calling the shutdown function when finishing. Currently, in 
HoodieSparkSqlWriter.cleanup(), the shutdown function is actively called, which 
means metrics are reported at the end of the write process. By doing the same 
in the table service, we can achieve the same effect.
   
   ### Change Logs
   
   Provides a file system-based metrics reporter
   
   ### Impact
   
   Some parameters related to the reporter:
   For example, in hoodie.metrics.reporter.type, FILESYSTEM has been added.
   And FILESYSTEM specifies the address, naming, and whether to enable 
scheduled writing of the metrics file.
   
   
   
   ### Risk level (write none, low medium or high below)
   
   LOW
   
   ### Documentation Update
   
   Metrics report type supports FILESYSTEM
   Updated parameters:
   hoodie.metrics.reporter.type, FILESYSTEM has been added.
   
   New parameters:
   hoodie.metrics.filesystem.reporter.path - The path for persisting Hudi 
storage metrics files.
   hoodie.metrics.filesystem.metric.prefix - The prefix for Hudi storage 
metrics persistence file names.
   hoodie.metrics.filesystem.overwrite.file - Whether to override the same 
metrics file for the same table.
   hoodie.metrics.filesystem.schedule.enable - Whether to enable scheduled 
output of metrics to the file system. Default is off, only need to output the 
final result to the file system.
   hoodie.metrics.filesystem.report.period.seconds - File system reporting 
period in seconds. Default to 60.
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to