Wrekkers opened a new issue #3665:
URL: https://github.com/apache/hudi/issues/3665


   **Problem Description**
   
   I was using GraphiteReporter for HUDI metrics and executing my app on Amazon 
EMR Cluster by running spark on Steps. I am running Graphite using 
[prometheus/graphite_exporter](https://github.com/prometheus/graphite_exporter) 
on a K8 cluster. However, I am unable to see any metrics, when I try the same 
using spark-shell HUDI is able to publish the metrics and are visible on the 
graphite server.
   On further investigation, I found the class 
[MetricsGraphiteReporter](https://github.com/apache/hudi/blob/master/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metrics/MetricsGraphiteReporter.java)
 has a delay of 30s in starting the metrics reporter.
   So my hypothesis is that since the step on EMR finishes as soon as the Spark 
job executes the commit the socket connection between the EMR and graphite 
server is not properly established as I am not able to see any warning or error 
logs.
   
   **To Reproduce**
   
   Steps to reproduce the behavior:
   
   1. Launch a prometheus/graphite_exporter 
[image](https://hub.docker.com/r/prom/graphite-exporter/) on a K8 cluster, in 
the same subnet launch an AWS EMR cluster that can access the graphite server.
   2. Execute a step on EMR that simply performs a HUDI dataframe write with 
metrics set as on and the write config options set as:
   
   ```
   
   val nessieHudiMetricConfig: Map[String, String] = Map(
        METRICS_ON -> "true",
           METRICS_REPORTER_TYPE -> "GRAPHITE",
        GRAPHITE_SERVER_HOST -> "<IP of graphite server>",
        GRAPHITE_SERVER_PORT -> "<TCP PORT>",
        GRAPHITE_METRIC_PREFIX -> "nessie"
   )
   
   df.write.format("hudi").
    options(getQuickstartWriteConfigs).
    option(PRECOMBINE_FIELD_OPT_KEY, "ts").
    option(RECORDKEY_FIELD_OPT_KEY, "uuid").
    option(PARTITIONPATH_FIELD_OPT_KEY, "partitionpath").
    option(TABLE_NAME, tableName).
    options(nessieHudiMetricConfig).
    mode(Overwrite).
    save(basePath)
   ```
   
   
   **Expected behavior**
   
   The metrics should be published as soon as the write commit happens.
   
   **Environment Description**
   
   * Hudi version : 0.8.0
   
   * Spark version : 2.4.7
   
   * Running on EMR : yes
   
   
   **Additional context**
   
   No error logs/warnings.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to