Cody Koeninger commented on SPARK-15272:

Checking to see if the 0.10 consumer's handling of preferred locations 
 addresses this.

> DirectKafkaInputDStream doesn't work with window operation
> ----------------------------------------------------------
>                 Key: SPARK-15272
>                 URL: https://issues.apache.org/jira/browse/SPARK-15272
>             Project: Spark
>          Issue Type: Bug
>          Components: Streaming
>    Affects Versions: 1.5.2
>            Reporter: Lubomir Nerad
> Using Kafka direct {{DStream}} with simple window operation like:
> {code:java}
> kafkaDStream.window(Durations.milliseconds(10000),
>                     Durations.milliseconds(1000));
>             .print();
> {code}
> with 1s batch duration either freezes after several seconds or lags terribly 
> (depending on cluster mode).
> This happens when Kafka brokers are not part of the Spark cluster (they are 
> on different nodes). The {{KafkaRDD}} still reports them as preferred 
> locations. This doesn't seem to be problem in non-window scenarios but with 
> window it conflicts with delay scheduling algorithm implemented in 
> {{TaskSetManager}}. It either significantly delays (Yarn mode) or completely 
> drains (Spark mode) resource offers with {{TaskLocality.ANY}} which are 
> needed to process tasks with these Kafka broker aligned preferred locations. 
> When delay scheduling algorithm is switched off ({{spark.locality.wait=0}}), 
> the example works correctly.
> I think that the {{KafkaRDD}} shouldn't report preferred locations if the 
> brokers don't correspond to worker nodes or allow the reporting of preferred 
> locations to be switched off. Also it would be good if delay scheduling 
> algorithm didn't drain / delay offers in the case, the tasks have unmatched 
> preferred locations.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to