[GitHub] [pulsar] poorbarcode opened a new pull request, #16758: [improve][txn] PIP-160 Metrics stats of Transaction buffered writer

GitBox Thu, 08 Sep 2022 02:48:45 -0700


poorbarcode opened a new pull request, #16758:
URL: https://github.com/apache/pulsar/pull/16758

Master Issue: #15370

### Motivation

see #15370

### Modifications

I will complete proposal #15370 with these pull requests( *current pull
request is a part of step 7-1* ):

1. Write the batch transaction log handler: `TxnLogBufferedWriter`
2. Configuration changes and protocol changes.
3. Transaction log store enables the batch feature.
4. Pending ack log store enables the batch feature.
5. Supports dynamic configuration.
6. Append admin API for transaction batch log and docs( admin and
configuration doc ).
GET /admin/v3/transactions/coordinatorStats
GET
/admin/v3/transactions/pendingAckStats/:tenant/:namespace:/:topic:/:subName
7. Append metrics support for transaction batch log.
7-1. Metrics of Txn Buffered Writer.
7-2. `TransactionLog` and `PendingAckStore` enables the Metrics of Txn
Buffered Writer

----

### The desired effect

`TransactionLog` should create `TxnLogBufferedWriter` with params:

```JSON
{
"metricsPrefix": "pulsar_txn_tc",
"labelNames": "coordinatorId",
"labelValues": "1"
}
```

The metrics output of `TransactionLog` will like this:

```
# A metrics for how many batches were triggered due to threshold
"batchedWriteMaxRecords".
# TYPE pulsar_txn_tc_batched_log_batched_log_triggering_count_by_records
Counter

pulsar_txn_tc_batched_log_batched_log_triggering_count_by_records{coordinatorId="1"}
15
...
...
...
# pulsar_txn_tc_batched_log_records_count_per_entry A metrics for how many
records in per batch written by the component[pulsar_txn_tc] per batch.
# TYPE pulsar_txn_tc_batched_log_records_count_per_entry Histogram
pulsar_txn_tc_batched_log_records_count_per_entry_bucket{coordinatorId="1",
le="10"} 1
pulsar_txn_tc_batched_log_records_count_per_entry_bucket{coordinatorId="1",
le="50"} 3
pulsar_txn_tc_batched_log_records_count_per_entry_bucket{coordinatorId="1",
le="100"} 5
pulsar_txn_tc_batched_log_records_count_per_entry_bucket{coordinatorId="1",
le="500"} 10
pulsar_txn_tc_batched_log_records_count_per_entry_bucket{coordinatorId="1",
le="1000"} 10
pulsar_txn_tc_batched_log_records_count_per_entry_bucket{coordinatorId="1",
le="+Inf"} 10
pulsar_txn_tc_batched_log_records_count_per_entry_count{coordinatorId="1",
le="+Inf"} 10
pulsar_txn_tc_batched_log_records_count_per_entry_sum{coordinatorId="1",
le="+Inf"} 5432
```

`PendingAckStore` is the same. But all the PendingackStores will not
differentiate the Subscription labels (because there are too many)

----

### Manage the registered collectors ourselves.

To build Metrics Stat, we need to execute these two steps:
1. Create `Collector` and register to `CollectorRegistry`, perhaps the
Collector is `Histogram` or `Counter`
2. Register labels to `Collector` and get `Collector.child`(holds by Metrics
Stat). This step can also be omitted because we can execute
`collector.labels(labelValues)` to get `Collector.child`.

In the Transaction log scenario, multiple Transaction Logs share the same
`Collector`, and each has its own `Collector.Child`, so when we build metrics
stat for each Transaction Log, we call `collector.labels(labelValues)` to get
the `Collector.Child`. However, the CollectorRegistry does not provide an API
like this:

```java
public Collector getRegistedCollector(String name);
```

and it will throw IllegalArgumentException when we registering collector
with the same name more than once, see:

https://github.com/prometheus/client_java/blob/1966186deaaa83aec496d88ff604a90795bad688/simpleclient/src/main/java/io/prometheus/client/CollectorRegistry.java#L49-L65

So we have to manage the registered collectors ourselves.

----

#### Holds the `Collector.child` by each Metrics stat instance

To save the overhead of `collector.labels(labelValues)`, we make each
Metrics Stat hold a reference of `Collector.child`, because this method is not
light enough:

https://github.com/prometheus/client_java/blob/1966186deaaa83aec496d88ff604a90795bad688/simpleclient/src/main/java/io/prometheus/client/SimpleCollector.java#L63-L80

----

#### Code will be removed in the next PR (7-2)

In the complete design, we should have two implementations like UML blow,
one for enabling the batch feature, and another for disabled:

![uml](https://user-images.githubusercontent.com/25195800/182418488-6469a38f-a96c-44e9-8ee6-01273b58b0cd.jpeg)

To reduce later maintenance costs, I'd like to ditch the
'DisabledMetricsStat' and we'll always use the implementation 'MetricsStatimpl'
even if the Txn buffer writer disables the batch feature. This constructor
without 'param-metricsStats' and these' null checks' will be removed in the
next PR. This is compatible only with split PR, making each PR have less code

### Documentation

- [ ] `doc-required`

- [x] `doc-not-needed`

- [ ] `doc`

- [ ] `doc-complete`

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [pulsar] poorbarcode opened a new pull request, #16758: [improve][txn] PIP-160 Metrics stats of Transaction buffered writer

Reply via email to