Hi Luke,

Thanks for the detailed explanation. I agree that the current proposal of 
RemainingLogs and RemainingSegments will greatly improve the situation, and 
that we can go ahead with the KIP as is.

If RemainingBytes were straight-forward to implement, then I’d like to have it. 
But we can live without it for now. And if people start using RemainingLogs and 
RemainingSegments and then REALLY FEEL like they need RemainingBytes, then we 
can always add it in the future.

Thanks Luke, for the detailed explanation, and for responding to my feedback!

-James

Sent from my iPhone

> On May 10, 2022, at 6:48 AM, Luke Chen <show...@gmail.com> wrote:
> 
> Hi James and all,
> 
> I checked again and I can see when creating UnifiedLog, we expected the
> logs/indexes/snapshots are in good state.
> So, I don't think we should break the current design to expose the
> `RemainingBytesToRecovery`
> metric.
> 
> If there is no other comments, I'll start a vote within this week.
> 
> Thank you.
> Luke
> 
>> On Fri, May 6, 2022 at 6:00 PM Luke Chen <show...@gmail.com> wrote:
>> 
>> Hi James,
>> 
>> Thanks for your input.
>> 
>> For the `RemainingBytesToRecovery` metric proposal, I think there's one
>> thing I didn't make it clear.
>> Currently, when log manager start up, we'll try to load all logs
>> (segments), and during the log loading, we'll try to recover logs if
>> necessary.
>> And the logs loading is using "thread pool" as you thought.
>> 
>> So, here's the problem:
>> All segments in each log folder (partition) will be loaded in each log
>> recovery thread, and until it's loaded, we can know how many segments (or
>> how many Bytes) needed to recover.
>> That means, if we have 10 partition logs in one broker, and we have 2 log
>> recovery threads (num.recovery.threads.per.data.dir=2), before the
>> threads load the segments in each log, we only know how many logs
>> (partitions) we have in the broker (i.e. RemainingLogsToRecover metric).
>> We cannot know how many segments/Bytes needed to recover until each thread
>> starts to load the segments under one log (partition).
>> 
>> So, the example in the KIP, it shows:
>> Currently, there are still 5 logs (partitions) needed to recover under
>> /tmp/log1 dir. And there are 2 threads doing the jobs, where one thread has
>> 10000 segments needed to recover, and the other one has 3 segments needed
>> to recover.
>> 
>>   - kafka.log
>>      - LogManager
>>         - RemainingLogsToRecover
>>            - /tmp/log1 => 5            ← there are 5 logs under
>>            /tmp/log1 needed to be recovered
>>            - /tmp/log2 => 0
>>         - RemainingSegmentsToRecover
>>            - /tmp/log1                     ← 2 threads are doing log
>>            recovery for /tmp/log1
>>            - 0 => 10000         ← there are 10000 segments needed to be
>>               recovered for thread 0
>>               - 1 => 3
>>               - /tmp/log2
>>               - 0 => 0
>>               - 1 => 0
>> 
>> 
>> So, after a while, the metrics might look like this:
>> It said, now, there are only 4 logs needed to recover in /tmp/log1, and
>> the thread 0 has 9000 segments left, and thread 1 has 5 segments left
>> (which should imply the thread already completed 2 logs recovery in the
>> period)
>> 
>>   - kafka.log
>>      - LogManager
>>         - RemainingLogsToRecover
>>            - /tmp/log1 => 3            ← there are 3 logs under
>>            /tmp/log1 needed to be recovered
>>            - /tmp/log2 => 0
>>         - RemainingSegmentsToRecover
>>            - /tmp/log1                     ← 2 threads are doing log
>>            recovery for /tmp/log1
>>            - 0 => 9000         ← there are 9000 segments needed to be
>>               recovered for thread 0
>>               - 1 => 5
>>               - /tmp/log2
>>               - 0 => 0
>>               - 1 => 0
>> 
>> 
>> That said, the `RemainingBytesToRecovery` metric is difficult to achieve
>> as you expected. I think the current proposal with `RemainingLogsToRecover`
>> and `RemainingSegmentsToRecover` should already provide enough info for
>> the log recovery progress.
>> 
>> I've also updated the KIP example to make it clear.
>> 
>> 
>> Thank you.
>> Luke
>> 
>> 
>>> On Thu, May 5, 2022 at 3:31 AM James Cheng <wushuja...@gmail.com> wrote:
>>> 
>>> Hi Luke,
>>> 
>>> Thanks for adding RemainingSegmentsToRecovery.
>>> 
>>> Another thought: different topics can have different segment sizes. I
>>> don't know how common it is, but it is possible. Some topics might want
>>> small segment sizes to more granular expiration of data.
>>> 
>>> The downside of RemainingLogsToRecovery and RemainingSegmentsToRecovery
>>> is that the rate that they will decrement depends on the configuration and
>>> patterns of the topics and partitions and segment sizes. If someone is
>>> monitoring those metrics, they might see times where the metric decrements
>>> slowly, followed by a burst where it decrements quickly.
>>> 
>>> What about RemainingBytesToRecovery? This would not depend on the
>>> configuration of the topic or of the data. It would actually be a pretty
>>> good metric, because I think that this metric would change at a constant
>>> rate (based on the disk I/O speed that the broker allocates to recovery).
>>> Because it changes at a constant rate, you would be able to use the
>>> rate-of-change to predict when it hits zero, which will let you know when
>>> the broker is going to start up. Like, I would imagine if we graphed
>>> RemainingBytesToRecovery that we'd see a fairly straight line that is
>>> decrementing at a steady rate towards zero.
>>> 
>>> What do you think about adding RemainingBytesToRecovery?
>>> 
>>> Or, what would you think about making the primary metric be
>>> RemainingBytesToRecovery, and getting rid of the others?
>>> 
>>> I don't know if I personally would rather have all 3 metrics, or would
>>> just use RemainingBytesToRecovery. I'd too would like more community input
>>> on which of those metrics would be useful to people.
>>> 
>>> About the JMX metrics, you said that if
>>> num.recovery.threads.per.data.dir=2, that there might be a separate
>>> RemainingSegmentsToRecovery counter for each thread. Is that actually how
>>> the data is structured within the Kafka recovery threads? Does each thread
>>> get a fixed set of partitions, or is there just one big pool of partitions
>>> that the threads all work on?
>>> 
>>> As a more concrete example:
>>> * If I have 9 small partitions and 1 big partition, and
>>> num.recovery.threads.per.data.dir=2
>>> Does each thread get 5 partitions, which means one thread will finish
>>> much sooner than the other?
>>> OR
>>> Do both threads just work on the set of 10 partitions, which means likely
>>> 1 thread will be busy with the big partition, while the other one ends up
>>> plowing through the 9 small partitions?
>>> 
>>> If each thread gets assigned 5 partitions, then it would make sense that
>>> each thread has its own counter.
>>> If the threads works on a single pool of 10 partitions, then it would
>>> probably mean that the counter is on the pool of partitions itself, and not
>>> on each thread.
>>> 
>>> -James
>>> 
>>>> On May 4, 2022, at 5:55 AM, Luke Chen <show...@gmail.com> wrote:
>>>> 
>>>> Hi devs,
>>>> 
>>>> If there are no other comments, I'll start a vote tomorrow.
>>>> 
>>>> Thank you.
>>>> Luke
>>>> 
>>>> On Sun, May 1, 2022 at 5:08 PM Luke Chen <show...@gmail.com> wrote:
>>>> 
>>>>> Hi James,
>>>>> 
>>>>> Sorry for the late reply.
>>>>> 
>>>>> Yes, this is a good point, to know how many segments to be recovered if
>>>>> there are some large partitions.
>>>>> I've updated the KIP, to add a `*RemainingSegmentsToRecover*` metric
>>> for
>>>>> each log recovery thread, to show the value.
>>>>> The example in the Proposed section here
>>>>> <
>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-831%3A+Add+metric+for+log+recovery+progress#KIP831:Addmetricforlogrecoveryprogress-ProposedChanges
>>>> 
>>>>> shows what it will look like.
>>>>> 
>>>>> Thanks for the suggestion.
>>>>> 
>>>>> Thank you.
>>>>> Luke
>>>>> 
>>>>> 
>>>>> 
>>>>> On Sat, Apr 23, 2022 at 8:54 AM James Cheng <wushuja...@gmail.com>
>>> wrote:
>>>>> 
>>>>>> The KIP describes RemainingLogsToRecovery, which seems to be the
>>> number
>>>>>> of partitions in each log.dir.
>>>>>> 
>>>>>> We have some partitions which are much much larger than others. Those
>>>>>> large partitions have many many more segments than others.
>>>>>> 
>>>>>> Is there a way the metric can reflect partition size? Could it be
>>>>>> RemainingSegmentsToRecover? Or even RemainingBytesToRecover?
>>>>>> 
>>>>>> -James
>>>>>> 
>>>>>> Sent from my iPhone
>>>>>> 
>>>>>>> On Apr 20, 2022, at 2:01 AM, Luke Chen <show...@gmail.com> wrote:
>>>>>>> 
>>>>>>> Hi all,
>>>>>>> 
>>>>>>> I'd like to propose a KIP to expose a metric for log recovery
>>> progress.
>>>>>>> This metric would let the admins have a way to monitor the log
>>> recovery
>>>>>>> progress.
>>>>>>> Details can be found here:
>>>>>>> 
>>>>>> 
>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-831%3A+Add+metric+for+log+recovery+progress
>>>>>>> 
>>>>>>> Any feedback is appreciated.
>>>>>>> 
>>>>>>> Thank you.
>>>>>>> Luke
>>>>>> 
>>>>> 
>>> 
>>> 

Reply via email to