Randall Schwager created SPARK-46798: ----------------------------------------
Summary: Kafka custom partition location assignment in Spark Structured Streaming (rack awareness) Key: SPARK-46798 URL: https://issues.apache.org/jira/browse/SPARK-46798 Project: Spark Issue Type: New Feature Components: Structured Streaming Affects Versions: 3.5.0, 3.4.0, 3.3.0, 3.2.0, 3.1.0 Reporter: Randall Schwager SPARK-15406 Added Kafka consumer support to Spark Structured Streaming, but it did not add custom partition location assignment as a feature. The Structured Streaming Kafka consumer as it exists today evenly allocates Kafka topic partitions to executors without regard to Kafka broker rack information or executor location. This behavior can drive large cross-AZ networking costs in large deployments. In the [Design Doc|https://docs.google.com/document/d/19t2rWe51x7tq2e5AOfrsM9qb8_m7BRuv9fel9i0PqR8/edit#heading=h.k36c6oyz89xw] for SPARK-15406, the ability to assign Kafka partitions to particular executors (a feature which would enable rack awareness) was discussed, but never implemented. For DStreams users, there does seem to be a way to assign Kafka partitions to Spark executors in a custom fashion: [LocationStrategies.PreferFixed|https://github.com/apache/spark/blob/master/connector/kafka-0-10/src/main/scala/org/apache/spark/streaming/kafka010/LocationStrategy.scala#L69]. I'd like to propose, and implement if approved, support for custom partition location assignment. Please find the design doc describing the change [here|https://docs.google.com/document/d/1RoEk_mt8AUh9sTQZ1NfzIuuYKf1zx6BP1K3IlJ2b8iM/edit#heading=h.pbt6pdb2jt5c] -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org