hey Luke,

I am interested in expanding the KIP scope but I am a bit concerned this
could create a lot of noise and confusion as they look like very similar
parameters, I agree this is a small change, so I think if I do it properly
it should not be a problem at all, I just will need a couple more of days
as I want to create the proper tests as well.

I have a doubt about editing the KIP, I mean should I add this as a new
feature as well?, should I describe this as a side effect finding? I don't
want this to minimize the main feature I want to deploy as I think the
message size limit is not as important as the limiting the amount of
batches.

It is up to you, if you guys consider we must add this in this KIP then I
will be happy to do it. 😀

Best regards.
Sergio Troiano

On Mon, 7 Mar 2022 at 02:01, Luke Chen <show...@gmail.com> wrote:

> Hi Sergio,
>
> Thanks for your explanation.
> Make sense to me.
>
> > Only interesting thing that I have just found is *max-message-size *is
> not
> used while dump logs are requested, instead it is used by dumpIndex
>
> Are you interested in expanding the scope of this KIP to include the
> *max-message-size* in dumping logs?
> I think it's good because it will also be a small change, and no need to go
> through another KIP discussing/voting process. But I'm fine if you want to
> keep this KIP as is, and create another JIRA ticket for future work.
>
> Thank you.
> Luke
>
> On Mon, Mar 7, 2022 at 6:02 AM Sergio Daniel Troiano
> <sergio.troi...@adevinta.com.invalid> wrote:
>
> > hey Luke,
> >
> > Let me answer them:
> > 1. If the *max-batches-size* is too small that results in no records
> > output, will we output any information to the user?
> >
> > If the  *max-batches-size*is even smaller than the first batch then there
> > won't be any output, this is handled by FileRecords class, I think this
> is
> > correct as this is the expected behaviour.
> >
> > 2. After your explanation, I guess the use of *max-batches-size* won't
> > conflict with *max-message-size*, right?
> >
> > Only interesting thing that I have just found is *max-message-size *is
> not
> > used while dump logs are requested, instead it is used by dumpIndex
> > <
> >
> https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/tools/DumpLogSegments.scala#L150
> > >
> > so,
> > this feature is not working for dumping logs, even though I checked if
> > there is a unit test for this and there is not any. Maybe we could
> create a
> > ticket for this?
> >
> > Regards.
> >
> >
> > On Sat, 5 Mar 2022 at 10:36, Luke Chen <show...@gmail.com> wrote:
> >
> > > Hi Sergio,
> > >
> > > Thanks for the explanation! Very clear!
> > > I think we should put this example and explanation into KIP.
> > >
> > > Other comments:
> > > 1. If the *max-batches-size* is too small that results in no records
> > > output, will we output any information to the user?
> > > 2. After your explanation, I guess the use of *max-batches-size* won't
> > > conflict with *max-message-size*, right?
> > > That is, user can set the 2 arguments at the same time. Is that
> correct?
> > >
> > > Thank you.
> > > Luke
> > >
> > > Thank you.
> > > Luke
> > >
> > > On Sat, Mar 5, 2022 at 4:47 PM Sergio Daniel Troiano
> > > <sergio.troi...@adevinta.com.invalid> wrote:
> > >
> > > > hey Luke,
> > > >
> > > > thanks for the interest, it is a good question, please let me explain
> > > you:
> > > >
> > > > *max-message-size *a filter for the size of each batch, so for
> example
> > if
> > > > Iset --max-message-size 1000 bytes and my segment log has 300
> batches,
> > > 150
> > > > of them has a size of 500 bytes  and the other 150 has a size of 2000
> > > bytes
> > > > then the script will skip the las 150 ones as each batch is heavier
> > than
> > > > the limit.
> > > >
> > > > In the other hand following the same example above with
> > *max-batches-size
> > > > *set
> > > > to 1000 bytes it will only print out the first 2 batches (500 bytes
> > each)
> > > > and stop, This will avoid reading the whole file
> > > >
> > > >
> > > > Also if all of them are smaller than 1000 bytes it will end up
> printing
> > > out
> > > > all the batches.
> > > > The idea of my change is to limit the *amount* of batches no matter
> > their
> > > > size.
> > > >
> > > > I hope this reply helps.
> > > > Best regards.
> > > >
> > > > On Sat, 5 Mar 2022 at 08:00, Luke Chen <show...@gmail.com> wrote:
> > > >
> > > > > Hi Sergio,
> > > > >
> > > > > Thanks for the KIP!
> > > > >
> > > > > One question:
> > > > > I saw there's a `max-message-size` argument that seems to do the
> same
> > > > thing
> > > > > as you want.
> > > > > Could you help explain what's the difference between
> > `max-message-size`
> > > > and
> > > > > `max-batches-size`?
> > > > >
> > > > > Thank you.
> > > > > Luke
> > > > >
> > > > > On Sat, Mar 5, 2022 at 3:21 AM Kirk True <k...@mustardgrain.com>
> > > wrote:
> > > > >
> > > > > > Hi Sergio,
> > > > > >
> > > > > > Thanks for the KIP. I don't know anything about the log segment
> > > > > internals,
> > > > > > but the logic and implementation seem sound.
> > > > > >
> > > > > > Three questions:
> > > > > >  1. Since the --max-batches-size unit is bytes, does it matter if
> > > that
> > > > > > size doesn't align to a record boundary?
> > > > > >  2. Can you add a check to make sure that --max-batches-size
> > doesn't
> > > > > allow
> > > > > > the user to pass in a negative number?
> > > > > >  3. Can you add/update any unit tests related to the
> > DumpLogSegments
> > > > > > arguments?
> > > > > > Thanks,
> > > > > > Kirk
> > > > > >
> > > > > > On Thu, Mar 3, 2022, at 1:32 PM, Sergio Daniel Troiano wrote:
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-824%3A+Allowing+dumping+segmentlogs+limiting+the+batches+in+the+output
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Reply via email to