[
https://issues.apache.org/jira/browse/KAFKA-13687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sergio Troiano updated KAFKA-13687:
-----------------------------------
Component/s: tools
> Limit number of batches when using kafka-dump-log.sh
> ----------------------------------------------------
>
> Key: KAFKA-13687
> URL: https://issues.apache.org/jira/browse/KAFKA-13687
> Project: Kafka
> Issue Type: Improvement
> Components: tools
> Reporter: Sergio Troiano
> Priority: Minor
> Labels: features
> Original Estimate: 96h
> Remaining Estimate: 96h
>
> Currently the kafka-dump-log.sh reads the whole files(s) and dumps the
> results of the segment file(s).
> As we know the savings the combination of using compression and batching
> while producing (if the payloads are good candidates) are huge.
>
> We would like to have a way to "monitor" the way the producers produce the
> batches as we not always have access to the producer metrics.
> We have multitenant producers so it is hard to "detect" when the usage is not
> the best.
>
> The problem with the current way the DumpLogs works is it reads the whole
> file, in an scenario of having thousands of topics with different segment
> sizes (default is 1 GB) we could end up affecting the cluster balance as we
> are removing useful page from the page cache and adding what we read from
> files.
>
> As we only need to take a few samples from the segments to see the pattern of
> the usage while producing we would like to add a new parameter called
> maxBatches.
>
> Based on the current script the change is quite small as it only needs a
> parameter and a counter.
>
> After adding this change for example we could periodically take smaller
> samples and analyze the batches headers (searching for compresscodec and the
> batch count)
>
> Doing this we could automate a tool to read all the topics and even going
> further we could take the payloads of those samples when we see the client is
> neither using compression nor batching and simulate a compression of the
> payloads (using batching and compression) then with those numbers we can
> reach the client for the proposal of saving money.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)