[
https://issues.apache.org/jira/browse/SAMZA-342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14074941#comment-14074941
]
TJ Giuli commented on SAMZA-342:
--------------------------------
Hey, Chris:
1.) This one is pretty tough to gauge by eye -- I believe that when eyeballing
the system at normal run state, according to logs from the processes that sends
out real-time messages, once a message is sent I observe it showing up in Kafka
almost instantaneously using kafka-console-consumer.sh
2.) I do see those log messages, and once KafkaSystemConsumer gets the
real-time message, there is some latency before my stream processor consumes it
{noformat}
2014-07-25 13:41:46 KafkaSystemConsumer [TRACE] Incoming message [REALTIME,0]:
MessageAndOffset(Message(magic = 0, attributes = 0, crc = 1154012242, key =
null, payload = java.nio.HeapByteBuffer[pos=0 lim=1647 cap=1647]),5).
014-07-25 13:41:49 TieredPriorityChooser [TRACE] Got prioritized envelope:
IncomingMessageEnvelope [systemStreamPartition=SystemStreamPartition
[partition=Partition [partition=0], system=kafka, stream=REALTIME], offset=4,
key=null, message="XXX"
{noformat}
So it does appear that the KafkaSystemConsumer receives the message and takes 3
seconds to deliver it, correct?
> Priority streams experience large latencies before being consumed by the
> stream processor
> -----------------------------------------------------------------------------------------
>
> Key: SAMZA-342
> URL: https://issues.apache.org/jira/browse/SAMZA-342
> Project: Samza
> Issue Type: Bug
> Components: kafka
> Affects Versions: 0.7.0
> Environment: ubuntu 13.10
> Reporter: TJ Giuli
>
> I have a stream processor that takes inputs from multiple streams, some are
> more batch, non-latency sensitive and others are real-time, infrequently have
> traffic and should be low-latency. The real-time stream helps me interpret
> the batch stream, so I would ideally like any real-time stream envelopes
> delivered within some maximum latency from the time the message enters into a
> Kafka topic.
> I have my stream processor configured to prioritize my real-time streams over
> the batch streams, but I consistently find that the real-time stream is
> delayed by traffic from the batch stream. From tracing the Kafka consumer,
> it looks like my stream processor periodically fetches from Kafka, finds that
> the batch streams have a large chunk of messages waiting, doesn’t find
> anything on the real-time topics, and processes away the batch messages for a
> few minutes. During the batch processing, the Kafka consumer does not poll
> the real-time streams, so if a message is sent to a real-time topic, the
> message effectively doesn’t arrive until the next time the Kafka consumer
> does another fetch. When a real-time message is consumed by the Kafka
> consumer, the TieredPriorityChooser correctly prioritizes traffic from the
> real-time streams over the batch streams.
--
This message was sent by Atlassian JIRA
(v6.2#6252)