[
https://issues.apache.org/jira/browse/SPARK-12203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15048879#comment-15048879
]
Cody Koeninger commented on SPARK-12203:
----------------------------------------
Commented on the PR. I don't think this makes sense for inclusion in spark, at
least in its current state.
I think efforts towards minimizing latency of the direct stream (assuming that
just tuning your batch sizes smaller isn't sufficient) would be better spent
pursuing pre-fetching / caching on the executors... but that's a noticeable
increase in complexity.
> Add KafkaDirectInputDStream that directly pulls messages from Kafka Brokers
> using receivers
> -------------------------------------------------------------------------------------------
>
> Key: SPARK-12203
> URL: https://issues.apache.org/jira/browse/SPARK-12203
> Project: Spark
> Issue Type: New Feature
> Components: Streaming
> Reporter: Liang-Chi Hsieh
>
> Currently, we have DirectKafkaInputDStream, which directly pulls messages
> from Kafka Brokers without any receivers, and KafkaInputDStream, which pulls
> messages from a Kafka Broker using receiver with zookeeper.
> As we observed, because DirectKafkaInputDStream retrieves messages from Kafka
> after each batch finishes, it posts a latency compared with KafkaInputDStream
> that continues to pull messages during each batch window.
> So we try to add KafkaDirectInputDStream that directly pulls messages from
> Kafka Brokers as DirectKafkaInputDStream, but it uses receivers as
> KafkaInputDStream and pulls messages during each batch window.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]