asafm commented on PR #18138:
URL: https://github.com/apache/pulsar/pull/18138#issuecomment-1285473616

   Thanks, @tjiuming for that prompt fix!
   I'll just expand a bit on the motivation:
   
   `SimpleTextOutputFormat` is the class used when writing Prometheus Text 
Format lines - we print those lines in this class, which inside writes as bytes 
to a `ByteBuf` (Netty's). That buffer is then flushed to the HTTP response upon 
`/metrics` endpoint (when Prometheus scrapes metrics from Pulsar).
   
   The Prometheus Text Format requires label values to be UTF-8 encoded, as 
written 
[here](https://github.com/Showmax/prometheus-docs/blob/master/content/docs/instrumenting/exposition_formats.md):
   > label_value can be any sequence of UTF-8 characters
   
   Today, as @tjiuming mentioned, the current implementation iterates over the 
String and each character is written in an undefined encoding - i.e. 
manipulating the `char` and then casting it to `byte`
   ```
   buffer.writeByte((byte) s.charAt(i));
   ```
   
   This causes Prometheus to fail the scraping as it tries to validate that the 
label value byte array in the response is a valid UTF-8 encoded string (which 
is not).
   
   The fix as @tjiuming mentioned is encoding the char using UTF-8 as provided 
by the `ByteBuf` class.
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to