[GitHub] [pulsar] 3286360470 opened a new issue, #17090: PIP-XYZ: Support latency quantile metric for pulsar read and write

GitBox Sun, 14 Aug 2022 07:27:40 -0700


3286360470 opened a new issue, #17090:
URL: https://github.com/apache/pulsar/issues/17090


   ### Motivation
   
   1. Currently, pulsar does not have an quantile metric used to statistics 
read and write latency, which is essential for our future troubleshooting and 
performance comparison tests.
   2. And we need network io metrics to determine the current network 
processing pressure, to prevent too many requests from hanging the broker.
   
   
   
   
   ### Goal
   
   1. Our goal is to statics the latency which begin at readHandler init and 
end at the messages are read from the cache or underlying storage(eg: 
bookkeeper / hdfs / ...).
   Why we just statics the latency between readHandler init and messages are 
found:
   1.1. Because when fetch request arrive the broker, It may be no new messages 
for Consumer, the broker will wait and periodically query whether there are new 
messages, so it may cause the read cache latency longer than the remote read.
   1.2. We statics the read latency between the cache/bookeeper/hdfs..., it is 
enough for us to troubleshooting and performance comparison tests.
   
   2. We just statics the idle network process number and its percnetile of the 
total number of io threads.
   
   
   ### API Changes
   
   1. Add OpStats get method to statics the write latency:
   ```
   public OpStatsLogger getOpStat();
   ```
   2. Add generate metrics method for netty thread pool usage
   ```
   private static void generateNetworkIdleMetrics(PulsarService pulsar, 
SimpleTextOutputStream stream);
   ```
   
   ### Implementation
   
   1. Add OpStats get method to statics the write latency:
   ```
   public OpStatsLogger getOpStat() {
           return 
PulsarService.statsProvider.getStatsLogger("").getOpStatsLogger("READ_ENTRY_LATENCY");
   }
   ```
   
   2. Add generate metrics method for netty thread pool usage:
   ```
   private static void generateNetworkIdleMetrics(PulsarService pulsar, 
SimpleTextOutputStream stream) {
           // generate network idle percent metrics
           try {
               int busyExecutors = 0;
               EventLoopGroup workerGroup = 
pulsar.getBrokerService().executor();
               Iterator<EventExecutor> iterator = workerGroup.iterator();
               while (iterator.hasNext()) {
                   SingleThreadEventExecutor next = (SingleThreadEventExecutor) 
iterator.next();
                   if (next.pendingTasks() > 0) {
                       ++busyExecutors;
                   }
               }
               int numIoThreads = pulsar.getConfiguration().getNumIOThreads();
               float ioWaitRatioMetric = (float) busyExecutors / (float) 
numIoThreads;
               // Metric: netIdlePercentile -> ioWaitRatioMetric
               writeNetIdlePercentileMetrics(stream, "brk_net_idle_percentile", 
(1 - ioWaitRatioMetric) * 100f,
                                             Collector.Type.GAUGE, clusterName, 
currentTimeMillis);
   
               // general network io queue metrics
               Iterator<EventExecutor> iterator1 = workerGroup.iterator();
               while (iterator1.hasNext()) {
                   SingleThreadEventExecutor next = (SingleThreadEventExecutor) 
iterator1.next();
                   int pendingTasks = next.pendingTasks();
                   String name = "brk_pending_tasks_" + 
StringUtils.replace(next.threadProperties().name(),
                                                                            
"-", "_");
                   // Metric: thread-name -> pendingTasks
                   writeNetIdlePercentileMetrics(stream, name, pendingTasks,
                                                 Collector.Type.GAUGE, 
clusterName, currentTimeMillis);
               }
           } catch (Exception e) {
               log.error("generate network idle percent metrics failed, error: 
[()]", e);
           }
   }
   ```
   
   ### Alternatives
   
   _No response_
   
   ### Anything else?
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [pulsar] 3286360470 opened a new issue, #17090: PIP-XYZ: Support latency quantile metric for pulsar read and write

Reply via email to