3286360470 opened a new issue, #17090:
URL: https://github.com/apache/pulsar/issues/17090
### Motivation
1. Currently, pulsar does not have an quantile metric used to statistics
read and write latency, which is essential for our future troubleshooting and
performance comparison tests.
2. And we need network io metrics to determine the current network
processing pressure, to prevent too many requests from hanging the broker.
### Goal
1. Our goal is to statics the latency which begin at readHandler init and
end at the messages are read from the cache or underlying storage(eg:
bookkeeper / hdfs / ...).
Why we just statics the latency between readHandler init and messages are
found:
1.1. Because when fetch request arrive the broker, It may be no new messages
for Consumer, the broker will wait and periodically query whether there are new
messages, so it may cause the read cache latency longer than the remote read.
1.2. We statics the read latency between the cache/bookeeper/hdfs..., it is
enough for us to troubleshooting and performance comparison tests.
2. We just statics the idle network process number and its percnetile of the
total number of io threads.
### API Changes
1. Add OpStats get method to statics the write latency:
```
public OpStatsLogger getOpStat();
```
2. Add generate metrics method for netty thread pool usage
```
private static void generateNetworkIdleMetrics(PulsarService pulsar,
SimpleTextOutputStream stream);
```
### Implementation
1. Add OpStats get method to statics the write latency:
```
public OpStatsLogger getOpStat() {
return
PulsarService.statsProvider.getStatsLogger("").getOpStatsLogger("READ_ENTRY_LATENCY");
}
```
2. Add generate metrics method for netty thread pool usage:
```
private static void generateNetworkIdleMetrics(PulsarService pulsar,
SimpleTextOutputStream stream) {
// generate network idle percent metrics
try {
int busyExecutors = 0;
EventLoopGroup workerGroup =
pulsar.getBrokerService().executor();
Iterator<EventExecutor> iterator = workerGroup.iterator();
while (iterator.hasNext()) {
SingleThreadEventExecutor next = (SingleThreadEventExecutor)
iterator.next();
if (next.pendingTasks() > 0) {
++busyExecutors;
}
}
int numIoThreads = pulsar.getConfiguration().getNumIOThreads();
float ioWaitRatioMetric = (float) busyExecutors / (float)
numIoThreads;
// Metric: netIdlePercentile -> ioWaitRatioMetric
writeNetIdlePercentileMetrics(stream, "brk_net_idle_percentile",
(1 - ioWaitRatioMetric) * 100f,
Collector.Type.GAUGE, clusterName,
currentTimeMillis);
// general network io queue metrics
Iterator<EventExecutor> iterator1 = workerGroup.iterator();
while (iterator1.hasNext()) {
SingleThreadEventExecutor next = (SingleThreadEventExecutor)
iterator1.next();
int pendingTasks = next.pendingTasks();
String name = "brk_pending_tasks_" +
StringUtils.replace(next.threadProperties().name(),
"-", "_");
// Metric: thread-name -> pendingTasks
writeNetIdlePercentileMetrics(stream, name, pendingTasks,
Collector.Type.GAUGE,
clusterName, currentTimeMillis);
}
} catch (Exception e) {
log.error("generate network idle percent metrics failed, error:
[()]", e);
}
}
```
### Alternatives
_No response_
### Anything else?
_No response_
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]