[ 
https://issues.apache.org/jira/browse/SPARK-15272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15570221#comment-15570221
 ] 

Cody Koeninger commented on SPARK-15272:
----------------------------------------

Checking to see if the 0.10 consumer's handling of preferred locations 
http://spark.apache.org/docs/latest/streaming-kafka-0-10-integration.html#locationstrategies
 addresses this.

> DirectKafkaInputDStream doesn't work with window operation
> ----------------------------------------------------------
>
>                 Key: SPARK-15272
>                 URL: https://issues.apache.org/jira/browse/SPARK-15272
>             Project: Spark
>          Issue Type: Bug
>          Components: Streaming
>    Affects Versions: 1.5.2
>            Reporter: Lubomir Nerad
>
> Using Kafka direct {{DStream}} with simple window operation like:
> {code:java}
> kafkaDStream.window(Durations.milliseconds(10000),
>                     Durations.milliseconds(1000));
>             .print();
> {code}
> with 1s batch duration either freezes after several seconds or lags terribly 
> (depending on cluster mode).
> This happens when Kafka brokers are not part of the Spark cluster (they are 
> on different nodes). The {{KafkaRDD}} still reports them as preferred 
> locations. This doesn't seem to be problem in non-window scenarios but with 
> window it conflicts with delay scheduling algorithm implemented in 
> {{TaskSetManager}}. It either significantly delays (Yarn mode) or completely 
> drains (Spark mode) resource offers with {{TaskLocality.ANY}} which are 
> needed to process tasks with these Kafka broker aligned preferred locations. 
> When delay scheduling algorithm is switched off ({{spark.locality.wait=0}}), 
> the example works correctly.
> I think that the {{KafkaRDD}} shouldn't report preferred locations if the 
> brokers don't correspond to worker nodes or allow the reporting of preferred 
> locations to be switched off. Also it would be good if delay scheduling 
> algorithm didn't drain / delay offers in the case, the tasks have unmatched 
> preferred locations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to