As for scalability being a fundamental aspect of Kafka's design and
implementation, besides the design doc, I guess this would be another
primary reference...

http://www.youtube.com/watch?v=Eq3i2m8aJBI

It's a pretty interesting video that touches on many aspects of Kafka, not
just scalability :)

--
Felix



On Tue, Aug 21, 2012 at 10:41 AM, Felix GV <fe...@mate1inc.com> wrote:

> What I meant is that Kafka has been designed first and foremost as a
> high-throughput system, and it is achieving that with a couple techniques,
> but mainly by batching a bunch of events together so that it can benefit
> from the lesser overhead of writing sequentially (as opposed to random
> access).
>
> Whether you choose to publish synchronously or asynchronously should not
> change anything to the fact that Kafka can achieve a high throughput via
> batching.
>
> --
> Felix
>
>
>
>
> On Mon, Aug 20, 2012 at 10:55 PM, wm <wmartin...@gmail.com> wrote:
>
>> Felix. My regets for confusing the matter.  Please inform me of a primary
>> source for the canonical use case you reference, unless that was scoped to
>> the kafka community only. That sort of statement should be clearly
>> documented imho.
>>
>> I am considering the matter closed with respect to this list. I have 3
>> publish options each with some degree of autonomy from the calling code's
>> designed behavior.
>>
>> regards
>>
>>
>> On 08/20/2012 02:39 PM, Felix GV wrote:
>>
>>> I think the difference is merely that async publishing is a non-blocking
>>> call, whereas sync publishing is a blocking call, meaning that the code
>>> that does a sync publish call could choose to have an alternate behavior
>>> if
>>> the publish failed, whereas the code that does an async publish would
>>> never
>>> know whether the publish succeeded or not.
>>>
>>> But like I said, in both cases, you can configure the batching size at
>>> the
>>> producer level, and a batching size greater than 1 will provide you with
>>> better throughput capabilities... In fact, I think this is the canonical
>>> use case Kafka was originally built for.
>>>
>>> --
>>> Felix
>>>
>>>
>>>
>>> On Mon, Aug 20, 2012 at 2:24 PM, will martin <wmartin...@gmail.com>
>>> wrote:
>>>
>>>  My understanding is that async is not meant to be an immediate send. As
>>>> to
>>>> batching, I've not delved into the code differences.
>>>>
>>>> But batching the sync is not possible at the Producer higher level; at
>>>> least that's what I've tried and had no success with, the default and
>>>> string encoders cannot handle lists, although the documentation suggests
>>>> they can.
>>>>
>>>> I'm glad to be wrong on this; but I've had no luck with the serializer
>>>> deep
>>>> in scala code tree accepting a composite of any type containing either
>>>> Message or String.  I can batch myself, but doubt this is what any of us
>>>> think is the design goal?
>>>>
>>>>
>>>>
>>>> On Mon, Aug 20, 2012 at 1:06 PM, Felix GV <fe...@mate1inc.com> wrote:
>>>>
>>>>  This may not be entirely related to what you're talking about, but why
>>>>> would an async producer not be able to meet your throughput needs, and
>>>>> a
>>>>> sync producer be able to?
>>>>>
>>>>> Both sync and async producers can be configured to batch more than one
>>>>> message together, and that's pretty much the main thing that's required
>>>>>
>>>> to
>>>>
>>>>> be able to achieve good throughput, AFAIK.
>>>>>
>>>>> ...?
>>>>>
>>>>> --
>>>>> Felix
>>>>>
>>>>>
>>>>>
>>>>> On Mon, Aug 20, 2012 at 12:49 PM, will martin <wmartin...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>  Thanks Neha. All my data is of 1 type. The serializer in place doesn't
>>>>>>
>>>>> seem
>>>>>
>>>>>> to handle an array of String.
>>>>>>
>>>>>> The ProducerData I use is a collection of same types of data wrapped
>>>>>>
>>>>> in a
>>>>
>>>>> single defintion, according to as I read spec.  Am I to understand
>>>>>>
>>>>> that,
>>>>
>>>>> having a producer batch records itself is unsupported?  The async
>>>>>>
>>>>> producer
>>>>>
>>>>>> can't meet my throughput needs and as I understand is targetted at
>>>>>>
>>>>> implicit
>>>>>
>>>>>> load balancing among different client machines.
>>>>>>
>>>>>> Additionally, the sync producer can meet my needs, but requires more
>>>>>>
>>>>> use
>>>>
>>>>> of
>>>>>
>>>>>> the lower-level design features. For maintenance, it'd be great if I
>>>>>>
>>>>> could
>>>>>
>>>>>> create a list of Strings, create a ProducerData<String, List<String>>
>>>>>>
>>>>> and
>>>>
>>>>> have this be serialized.
>>>>>>
>>>>>> It occurs to me that the described  serialization may need my
>>>>>>
>>>>> attention?
>>>>
>>>>> Thx
>>>>>>
>>>>>>
>>>>>> On Mon, Aug 20, 2012 at 12:06 PM, Neha Narkhede <
>>>>>>
>>>>> neha.narkh...@gmail.com
>>>>
>>>>> wrote:
>>>>>>> The producer takes in a "serializer.class" config that it uses to
>>>>>>> serialize data sent by the Producer. A Producer instance is tied to
>>>>>>> the type of data it is sending, so you won't be able to send data
>>>>>>> belonging to diverse types using the same Producer object.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Neha
>>>>>>>
>>>>>>> On Mon, Aug 20, 2012 at 8:02 AM, will martin <wmartin...@gmail.com>
>>>>>>>
>>>>>> wrote:
>>>>>>
>>>>>>> This use case is defined by the following snippet from the Design
>>>>>>>>
>>>>>>> section
>>>>>>
>>>>>>> of the doc pages.
>>>>>>>>
>>>>>>>> class Producer {
>>>>>>>>
>>>>>>>> public void send (ProducerData)
>>>>>>>>
>>>>>>>> public void send (List<ProducerData>)
>>>>>>>>
>>>>>>>> public void close()
>>>>>>>> }
>>>>>>>>
>>>>>>>> I've tried various composites for the List<ProducerData> argument,
>>>>>>>> including strings and Messages. All of these throw serialization
>>>>>>>>
>>>>>>> errors
>>>>>
>>>>>> deep in the engine.
>>>>>>>>
>>>>>>>> Is the list form of send supported in 7.1?
>>>>>>>>
>>>>>>>> Thanks in advance,
>>>>>>>> mmartin
>>>>>>>>
>>>>>>>
>>
>

Reply via email to