Hey TJ,

Yea, this sounds like a bug. Can you open a JIRA, and we can continue the
discussion there?

My guess is that there's some buffering going on somewhere, or polling is
not happening in the BrokerProxy.

Cheers,
Chris

On 7/15/14 12:50 PM, "TJ Giuli" <[email protected]> wrote:

>Hey, Chris:
>1.  Yes, batch size set to 1: systems.kafka.producer.batch.num.messages=1
>2.   Building default chooser with: useBatching=false,
>useBootstrapping=false, usePriority=true
>3.  I just re-ran my test and the delay is about 16 seconds.  When I’m
>really pumping data through the batch Kafka topics, I’ve seen around 1
>minute.
>
>Thanks!
>—T
>
>On Jul 15, 2014, at 10:11 AM, Chris Riccomini
><[email protected]> wrote:
>
>> Hey TJ,
>> 
>> This sounds like a bug.
>> 
>> 1. Is the task.consumer.batch.size config set for your job?
>> 2. What does this log line say: Building default chooser with:
>> useBatching=%s, useBootstrapping=%s, usePriority=%s
>> 3. How long is the delay that you're seeing?
>> 
>> Cheers,
>> Chris
>> 
>> On 7/14/14 4:57 PM, "Yan Fang" <[email protected]> wrote:
>> 
>>> It seems for me that, the batch processing fetches many messages at one
>>> time and then takes too long time to process. My first thought is that,
>>> since we have the systems.system-name.samza.fetch.threshold
>>> 
>>><http://samza.incubator.apache.org/learn/documentation/0.7.0/jobs/config
>>>ur
>>> ation-table.html>
>>> ,
>>> setting this number to a smaller (default is 50000) will force the
>>>system
>>> to fetch the messages more frequently.
>>> 
>>> Any other ideas?
>>> 
>>> Fang, Yan
>>> [email protected]
>>> +1 (206) 849-4108
>>> 
>>> 
>>> On Mon, Jul 14, 2014 at 1:12 PM, TJ Giuli <[email protected]>
>>> wrote:
>>> 
>>>> Hi,
>>>> 
>>>> I have a stream processor that takes inputs from multiple streams,
>>>>some
>>>> are more batch, non-latency sensitive and others are real-time,
>>>> infrequently have traffic and should be low-latency.  The real-time
>>>> stream
>>>> helps me interpret the batch stream, so I would ideally like any
>>>> real-time
>>>> stream envelopes delivered within some maximum latency from the time
>>>>the
>>>> message enters into a Kafka topic.
>>>> 
>>>> I have my stream processor configured to prioritize my real-time
>>>>streams
>>>> over the batch streams, but I consistently find that the real-time
>>>> stream
>>>> is delayed by traffic from the batch stream.  From tracing the Kafka
>>>> consumer, it looks like my stream processor periodically fetches from
>>>> Kafka, finds that the batch streams have a large chunk of messages
>>>> waiting,
>>>> doesn¹t find anything on the real-time topics, and processes away the
>>>> batch
>>>> messages for a few minutes.  During the batch processing, the Kafka
>>>> consumer does not poll the real-time streams, so if a message is sent
>>>> to a
>>>> real-time topic, the message effectively doesn¹t arrive until the next
>>>> time
>>>> the Kafka consumer does another fetch.  When a real-time message is
>>>> consumed by the Kafka consumer, the TieredPriorityChooser correctly
>>>> prioritizes traffic from the real-time streams over the batch streams.
>>>> 
>>>> Does anyone have recommendations on how to get infrequent but
>>>>important
>>>> messages within some maximum time bound?  Thanks!
>>>> ‹T
>> 
>

Reply via email to