Randall Schwager created SPARK-46798:
----------------------------------------

             Summary: Kafka custom partition location assignment in Spark 
Structured Streaming (rack awareness)
                 Key: SPARK-46798
                 URL: https://issues.apache.org/jira/browse/SPARK-46798
             Project: Spark
          Issue Type: New Feature
          Components: Structured Streaming
    Affects Versions: 3.5.0, 3.4.0, 3.3.0, 3.2.0, 3.1.0
            Reporter: Randall Schwager


SPARK-15406 Added Kafka consumer support to Spark Structured Streaming, but it 
did not add custom partition location assignment as a feature. The Structured 
Streaming Kafka consumer as it exists today evenly allocates Kafka topic 
partitions to executors without regard to Kafka broker rack information or 
executor location. This behavior can drive large cross-AZ networking costs in 
large deployments.

In the [Design 
Doc|https://docs.google.com/document/d/19t2rWe51x7tq2e5AOfrsM9qb8_m7BRuv9fel9i0PqR8/edit#heading=h.k36c6oyz89xw]
 for SPARK-15406, the ability to assign Kafka partitions to particular 
executors (a feature which would enable rack awareness) was discussed, but 
never implemented.

For DStreams users, there does seem to be a way to assign Kafka partitions to 
Spark executors in a custom fashion: 
[LocationStrategies.PreferFixed|https://github.com/apache/spark/blob/master/connector/kafka-0-10/src/main/scala/org/apache/spark/streaming/kafka010/LocationStrategy.scala#L69].

I'd like to propose, and implement if approved, support for custom partition 
location assignment. Please find the design doc describing the change 
[here|https://docs.google.com/document/d/1RoEk_mt8AUh9sTQZ1NfzIuuYKf1zx6BP1K3IlJ2b8iM/edit#heading=h.pbt6pdb2jt5c]
 






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to