[ 
https://issues.apache.org/jira/browse/SPARK-12203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15048935#comment-15048935
 ] 

Liang-Chi Hsieh commented on SPARK-12203:
-----------------------------------------

Thanks for commenting.

As I said on the PR, this is a very initial attempt to solve our need for kafka 
input dstream with exactly once feature and without latency.

You are right that in order to minimize latency we would increase the 
complexity with these approaches. Using receivers is the simplest way we think 
to approach this problem in our cases. I close this PR now because it is in 
very initial state and not ready.

> Add KafkaDirectInputDStream that directly pulls messages from Kafka Brokers 
> using receivers
> -------------------------------------------------------------------------------------------
>
>                 Key: SPARK-12203
>                 URL: https://issues.apache.org/jira/browse/SPARK-12203
>             Project: Spark
>          Issue Type: New Feature
>          Components: Streaming
>            Reporter: Liang-Chi Hsieh
>
> Currently, we have DirectKafkaInputDStream, which directly pulls messages 
> from Kafka Brokers without any receivers, and KafkaInputDStream, which pulls 
> messages from a Kafka Broker using receiver with zookeeper.
> As we observed, because DirectKafkaInputDStream retrieves messages from Kafka 
> after each batch finishes, it posts a latency compared with KafkaInputDStream 
> that continues to pull messages during each batch window.
> So we try to add KafkaDirectInputDStream that directly pulls messages from 
> Kafka Brokers as DirectKafkaInputDStream, but it uses receivers as 
> KafkaInputDStream and pulls messages during each batch window.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to