Bhavesh,

Wouldn't using the default batch size of 16k have avoided this problem
entirely? I think the best solution now is just to change the
configuration. What I am saying is it is unlikely you will need to do this
again, the problem is just that 1MB partition batches are quite large so
you quickly run out of memory very quickly with that configuration.

I agree that the scala producer doesn't have this problem, but it actually
doesn't really let you control the memory use or the request size very
effectively which I would argue is a much bigger problem. Once you
introduce those controls you have to configure how to make use of them,
which is what this is about.

-Jay

On Wed, Nov 5, 2014 at 3:45 PM, Bhavesh Mistry <mistry.p.bhav...@gmail.com>
wrote:

> Hi Jay or Kafka Dev Team,
>
> Any suggestions,  how I can deal with this situation of expanding
> partitions for New Java Producer for scalability (consumer side) ?
>
> Thanks,
>
> Bhavesh
>
> On Tue, Nov 4, 2014 at 7:08 PM, Bhavesh Mistry <mistry.p.bhav...@gmail.com
> >
> wrote:
>
> > Also, to added to this Old producer (Scala based in not impacted by the
> > partition changes). So it is important scalability feature being taken
> way
> > if you do not plan for expansion from the beginning for New Java
> Producer.
> >
> > So, New Java Producer is taking way this critical feature (unless plan).
> >
> > Thanks,
> >
> > Bhavesh
> >
> > On Tue, Nov 4, 2014 at 4:56 PM, Bhavesh Mistry <
> mistry.p.bhav...@gmail.com
> > > wrote:
> >
> >> HI Jay,
> >>
> >> Fundamental, problem is batch size is already configured and producers
> >> are running in production with given configuration.  ( Previous value
> were
> >> just sample).  How do we increase partitions for topics when batch size
> >> exceed and configured buffer limit ?  Yes, had we planed for batch size
> >> smaller we can do this, but we cannot do this if producers are already
> >> running.  Have you faced this problem at LinkedIn or any other place ?
> >>
> >>
> >> Thanks,
> >>
> >> Bhavesh
> >>
> >> On Tue, Nov 4, 2014 at 4:25 PM, Jay Kreps <jay.kr...@gmail.com> wrote:
> >>
> >>> Hey Bhavesh,
> >>>
> >>> No there isn't such a setting. But what I am saying is that I don't
> think
> >>> you really need that feature. I think instead you can use a 32k batch
> >>> size
> >>> with your 64M memory limit. This should mean you can have up up to 2048
> >>> batches in flight. Assuming one batch in flight and one being added to
> at
> >>> any given time, then this should work well for up to ~1000 partitions.
> So
> >>> rather than trying to do anything dynamic. So assuming each producer
> >>> sends
> >>> to just one topic then you would be fine as long as that topic had
> fewer
> >>> than 1000 partitions. If you wanted to add more you would need to add
> >>> memory on producers.
> >>>
> >>> -Jay
> >>>
> >>> On Tue, Nov 4, 2014 at 4:04 PM, Bhavesh Mistry <
> >>> mistry.p.bhav...@gmail.com>
> >>> wrote:
> >>>
> >>> > Hi Jay,
> >>> >
> >>> > I agree and understood what you have mentioned in previous email.
> But
> >>> when
> >>> > you have 5000+ producers running in cloud ( I am sure linkedin has
> many
> >>> > more and need to increase partitions for scalability) then all
> running
> >>> > producer will not send any data. So Is there any feature or setting
> >>> that
> >>> > make sense to shrink batch size to fit the increase.  I am sure other
> >>> will
> >>> > face the same issue.  Had I configured with block.on.buffer.full=true
> >>> it
> >>> > will be even worse and will block application threads.  Our use case
> is
> >>> > *logger.log(msg)* method can not be blocked so that is why we have
> >>> > configuration to false.
> >>> >
> >>> > So I am sure others will run into this same issues.   Try to find the
> >>> > optimal solution and recommendation from Kafka Dev team for this
> >>> particular
> >>> > use case (which may become common).
> >>> >
> >>> > Thanks,
> >>> >
> >>> > Bhavesh
> >>> >
> >>> > On Tue, Nov 4, 2014 at 3:12 PM, Jay Kreps <jay.kr...@gmail.com>
> wrote:
> >>> >
> >>> > > Hey Bhavesh,
> >>> > >
> >>> > > Here is what your configuration means
> >>> > > buffer.memory=64MB # This means don't use more than 64MB of memory
> >>> > > batch.size=1MB # This means allocate a 1MB buffer for each
> partition
> >>> with
> >>> > > data
> >>> > > block.on.buffer.full=false # This means immediately throw an
> >>> exception if
> >>> > > there is not enough memory to create a new buffer
> >>> > >
> >>> > > Not sure what linger time you have set.
> >>> > >
> >>> > > So what you see makes sense. If you have 1MB buffers and 32
> >>> partitions
> >>> > then
> >>> > > you will have approximately 32MB of memory in use (actually a bit
> >>> more
> >>> > than
> >>> > > this since one buffer will be filling while another is sending). If
> >>> you
> >>> > > have 128 partitions then you will try to use 128MB, and since you
> >>> have
> >>> > > configured the producer to fail when you reach 64 (rather than
> >>> waiting
> >>> > for
> >>> > > memory to become available) that is what happens.
> >>> > >
> >>> > > I suspect if you want a smaller batch size. More than 64k is
> usually
> >>> not
> >>> > > going to help throughput.
> >>> > >
> >>> > > -Jay
> >>> > >
> >>> > > On Tue, Nov 4, 2014 at 11:39 AM, Bhavesh Mistry <
> >>> > > mistry.p.bhav...@gmail.com>
> >>> > > wrote:
> >>> > >
> >>> > > > Hi Kafka Dev,
> >>> > > >
> >>> > > > With new Producer, we are having to change the # partitions for a
> >>> > topic,
> >>> > > > and we face this issue BufferExhaustedException.
> >>> > > >
> >>> > > > Here is example,   we have set 64MiB and 32 partitions and 1MiB
> of
> >>> > batch
> >>> > > > size.  But when we increase the partition to 128, it throws
> >>> > > > BufferExhaustedException right way (non key based message).
> >>> Buffer is
> >>> > > > allocated based on batch.size.  This is very common need to set
> >>> auto
> >>> > > > calculate batch size when partitions increase because we have
> about
> >>> > ~5000
> >>> > > > boxes and it is not practical to deploy code in all machines than
> >>> > expand
> >>> > > > partition for  scalability purpose.   What are options available
> >>> while
> >>> > > new
> >>> > > > producer is running and partition needs to increase and not
> enough
> >>> > buffer
> >>> > > > to allocate batch size for additional partition ?
> >>> > > >
> >>> > > > buffer.memory=64MiB
> >>> > > > batch.size=1MiB
> >>> > > > block.on.buffer.full=false
> >>> > > >
> >>> > > >
> >>> > > > Thanks,
> >>> > > >
> >>> > > > Bhavesh
> >>> > > >
> >>> > >
> >>> >
> >>>
> >>
> >>
> >
>

Reply via email to