hey Luke, Let me answer them: 1. If the *max-batches-size* is too small that results in no records output, will we output any information to the user?
If the *max-batches-size*is even smaller than the first batch then there won't be any output, this is handled by FileRecords class, I think this is correct as this is the expected behaviour. 2. After your explanation, I guess the use of *max-batches-size* won't conflict with *max-message-size*, right? Only interesting thing that I have just found is *max-message-size *is not used while dump logs are requested, instead it is used by dumpIndex <https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/tools/DumpLogSegments.scala#L150> so, this feature is not working for dumping logs, even though I checked if there is a unit test for this and there is not any. Maybe we could create a ticket for this? Regards. On Sat, 5 Mar 2022 at 10:36, Luke Chen <[email protected]> wrote: > Hi Sergio, > > Thanks for the explanation! Very clear! > I think we should put this example and explanation into KIP. > > Other comments: > 1. If the *max-batches-size* is too small that results in no records > output, will we output any information to the user? > 2. After your explanation, I guess the use of *max-batches-size* won't > conflict with *max-message-size*, right? > That is, user can set the 2 arguments at the same time. Is that correct? > > Thank you. > Luke > > Thank you. > Luke > > On Sat, Mar 5, 2022 at 4:47 PM Sergio Daniel Troiano > <[email protected]> wrote: > > > hey Luke, > > > > thanks for the interest, it is a good question, please let me explain > you: > > > > *max-message-size *a filter for the size of each batch, so for example if > > Iset --max-message-size 1000 bytes and my segment log has 300 batches, > 150 > > of them has a size of 500 bytes and the other 150 has a size of 2000 > bytes > > then the script will skip the las 150 ones as each batch is heavier than > > the limit. > > > > In the other hand following the same example above with *max-batches-size > > *set > > to 1000 bytes it will only print out the first 2 batches (500 bytes each) > > and stop, This will avoid reading the whole file > > > > > > Also if all of them are smaller than 1000 bytes it will end up printing > out > > all the batches. > > The idea of my change is to limit the *amount* of batches no matter their > > size. > > > > I hope this reply helps. > > Best regards. > > > > On Sat, 5 Mar 2022 at 08:00, Luke Chen <[email protected]> wrote: > > > > > Hi Sergio, > > > > > > Thanks for the KIP! > > > > > > One question: > > > I saw there's a `max-message-size` argument that seems to do the same > > thing > > > as you want. > > > Could you help explain what's the difference between `max-message-size` > > and > > > `max-batches-size`? > > > > > > Thank you. > > > Luke > > > > > > On Sat, Mar 5, 2022 at 3:21 AM Kirk True <[email protected]> > wrote: > > > > > > > Hi Sergio, > > > > > > > > Thanks for the KIP. I don't know anything about the log segment > > > internals, > > > > but the logic and implementation seem sound. > > > > > > > > Three questions: > > > > 1. Since the --max-batches-size unit is bytes, does it matter if > that > > > > size doesn't align to a record boundary? > > > > 2. Can you add a check to make sure that --max-batches-size doesn't > > > allow > > > > the user to pass in a negative number? > > > > 3. Can you add/update any unit tests related to the DumpLogSegments > > > > arguments? > > > > Thanks, > > > > Kirk > > > > > > > > On Thu, Mar 3, 2022, at 1:32 PM, Sergio Daniel Troiano wrote: > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-824%3A+Allowing+dumping+segmentlogs+limiting+the+batches+in+the+output > > > > > > > > > > > > > > >
