[
https://issues.apache.org/jira/browse/SPARK-15272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15281475#comment-15281475
]
Cody Koeninger commented on SPARK-15272:
----------------------------------------
My PR for the Kafka 0.10 consumer has fine grained control over preferred
locations.
Is turning locality wait off not a workaround for you in the meantime?
> DirectKafkaInputDStream doesn't work with window operation
> ----------------------------------------------------------
>
> Key: SPARK-15272
> URL: https://issues.apache.org/jira/browse/SPARK-15272
> Project: Spark
> Issue Type: Bug
> Components: Streaming
> Affects Versions: 1.5.2
> Reporter: Lubomir Nerad
>
> Using Kafka direct {{DStream}} with simple window operation like:
> {code:java}
> kafkaDStream.window(Durations.milliseconds(10000),
> Durations.milliseconds(1000));
> .print();
> {code}
> with 1s batch duration either freezes after several seconds or lags terribly
> (depending on cluster mode).
> This happens when Kafka brokers are not part of the Spark cluster (they are
> on different nodes). The {{KafkaRDD}} still reports them as preferred
> locations. This doesn't seem to be problem in non-window scenarios but with
> window it conflicts with delay scheduling algorithm implemented in
> {{TaskSetManager}}. It either significantly delays (Yarn mode) or completely
> drains (Spark mode) resource offers with {{TaskLocality.ANY}} which are
> needed to process tasks with these Kafka broker aligned preferred locations.
> When delay scheduling algorithm is switched off ({{spark.locality.wait=0}}),
> the example works correctly.
> I think that the {{KafkaRDD}} shouldn't report preferred locations if the
> brokers don't correspond to worker nodes or allow the reporting of preferred
> locations to be switched off. Also it would be good if delay scheduling
> algorithm didn't drain / delay offers in the case, the tasks have unmatched
> preferred locations.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]