As for scalability being a fundamental aspect of Kafka's design and implementation, besides the design doc, I guess this would be another primary reference...
http://www.youtube.com/watch?v=Eq3i2m8aJBI It's a pretty interesting video that touches on many aspects of Kafka, not just scalability :) -- Felix On Tue, Aug 21, 2012 at 10:41 AM, Felix GV <fe...@mate1inc.com> wrote: > What I meant is that Kafka has been designed first and foremost as a > high-throughput system, and it is achieving that with a couple techniques, > but mainly by batching a bunch of events together so that it can benefit > from the lesser overhead of writing sequentially (as opposed to random > access). > > Whether you choose to publish synchronously or asynchronously should not > change anything to the fact that Kafka can achieve a high throughput via > batching. > > -- > Felix > > > > > On Mon, Aug 20, 2012 at 10:55 PM, wm <wmartin...@gmail.com> wrote: > >> Felix. My regets for confusing the matter. Please inform me of a primary >> source for the canonical use case you reference, unless that was scoped to >> the kafka community only. That sort of statement should be clearly >> documented imho. >> >> I am considering the matter closed with respect to this list. I have 3 >> publish options each with some degree of autonomy from the calling code's >> designed behavior. >> >> regards >> >> >> On 08/20/2012 02:39 PM, Felix GV wrote: >> >>> I think the difference is merely that async publishing is a non-blocking >>> call, whereas sync publishing is a blocking call, meaning that the code >>> that does a sync publish call could choose to have an alternate behavior >>> if >>> the publish failed, whereas the code that does an async publish would >>> never >>> know whether the publish succeeded or not. >>> >>> But like I said, in both cases, you can configure the batching size at >>> the >>> producer level, and a batching size greater than 1 will provide you with >>> better throughput capabilities... In fact, I think this is the canonical >>> use case Kafka was originally built for. >>> >>> -- >>> Felix >>> >>> >>> >>> On Mon, Aug 20, 2012 at 2:24 PM, will martin <wmartin...@gmail.com> >>> wrote: >>> >>> My understanding is that async is not meant to be an immediate send. As >>>> to >>>> batching, I've not delved into the code differences. >>>> >>>> But batching the sync is not possible at the Producer higher level; at >>>> least that's what I've tried and had no success with, the default and >>>> string encoders cannot handle lists, although the documentation suggests >>>> they can. >>>> >>>> I'm glad to be wrong on this; but I've had no luck with the serializer >>>> deep >>>> in scala code tree accepting a composite of any type containing either >>>> Message or String. I can batch myself, but doubt this is what any of us >>>> think is the design goal? >>>> >>>> >>>> >>>> On Mon, Aug 20, 2012 at 1:06 PM, Felix GV <fe...@mate1inc.com> wrote: >>>> >>>> This may not be entirely related to what you're talking about, but why >>>>> would an async producer not be able to meet your throughput needs, and >>>>> a >>>>> sync producer be able to? >>>>> >>>>> Both sync and async producers can be configured to batch more than one >>>>> message together, and that's pretty much the main thing that's required >>>>> >>>> to >>>> >>>>> be able to achieve good throughput, AFAIK. >>>>> >>>>> ...? >>>>> >>>>> -- >>>>> Felix >>>>> >>>>> >>>>> >>>>> On Mon, Aug 20, 2012 at 12:49 PM, will martin <wmartin...@gmail.com> >>>>> wrote: >>>>> >>>>> Thanks Neha. All my data is of 1 type. The serializer in place doesn't >>>>>> >>>>> seem >>>>> >>>>>> to handle an array of String. >>>>>> >>>>>> The ProducerData I use is a collection of same types of data wrapped >>>>>> >>>>> in a >>>> >>>>> single defintion, according to as I read spec. Am I to understand >>>>>> >>>>> that, >>>> >>>>> having a producer batch records itself is unsupported? The async >>>>>> >>>>> producer >>>>> >>>>>> can't meet my throughput needs and as I understand is targetted at >>>>>> >>>>> implicit >>>>> >>>>>> load balancing among different client machines. >>>>>> >>>>>> Additionally, the sync producer can meet my needs, but requires more >>>>>> >>>>> use >>>> >>>>> of >>>>> >>>>>> the lower-level design features. For maintenance, it'd be great if I >>>>>> >>>>> could >>>>> >>>>>> create a list of Strings, create a ProducerData<String, List<String>> >>>>>> >>>>> and >>>> >>>>> have this be serialized. >>>>>> >>>>>> It occurs to me that the described serialization may need my >>>>>> >>>>> attention? >>>> >>>>> Thx >>>>>> >>>>>> >>>>>> On Mon, Aug 20, 2012 at 12:06 PM, Neha Narkhede < >>>>>> >>>>> neha.narkh...@gmail.com >>>> >>>>> wrote: >>>>>>> The producer takes in a "serializer.class" config that it uses to >>>>>>> serialize data sent by the Producer. A Producer instance is tied to >>>>>>> the type of data it is sending, so you won't be able to send data >>>>>>> belonging to diverse types using the same Producer object. >>>>>>> >>>>>>> Thanks, >>>>>>> Neha >>>>>>> >>>>>>> On Mon, Aug 20, 2012 at 8:02 AM, will martin <wmartin...@gmail.com> >>>>>>> >>>>>> wrote: >>>>>> >>>>>>> This use case is defined by the following snippet from the Design >>>>>>>> >>>>>>> section >>>>>> >>>>>>> of the doc pages. >>>>>>>> >>>>>>>> class Producer { >>>>>>>> >>>>>>>> public void send (ProducerData) >>>>>>>> >>>>>>>> public void send (List<ProducerData>) >>>>>>>> >>>>>>>> public void close() >>>>>>>> } >>>>>>>> >>>>>>>> I've tried various composites for the List<ProducerData> argument, >>>>>>>> including strings and Messages. All of these throw serialization >>>>>>>> >>>>>>> errors >>>>> >>>>>> deep in the engine. >>>>>>>> >>>>>>>> Is the list form of send supported in 7.1? >>>>>>>> >>>>>>>> Thanks in advance, >>>>>>>> mmartin >>>>>>>> >>>>>>> >> >