Re: [EXTERNAL] Re: Spark Kafka Rack Aware Consumer

Raghu Angadi Fri, 26 Jan 2024 10:20:45 -0800

Overall the proposal to make this an option for Kafka source SGTM.
You can address the doc review and can send PR (in parallel or after the
review).
Note that currently executors cache client connection to Kafka and reuse
the connection and buffered records for next micro-batch.
Your proposal would ideally keep that affinity as well (both can be done).


On Fri, Jan 26, 2024 at 8:21 AM Schwager, Randall <
randall.schwa...@charter.com> wrote:

> Granted. Thanks for bearing with me. I’ve also opened up permissions to
> allow anyone with the link to edit the document. Thank you!
>
>
>
> *From: *Mich Talebzadeh <mich.talebza...@gmail.com>
> *Date: *Friday, January 26, 2024 at 09:19
> *To: *"Schwager, Randall" <randall.schwa...@charter.com>
> *Cc: *"dev@spark.apache.org" <dev@spark.apache.org>
> *Subject: *Re: [EXTERNAL] Re: Spark Kafka Rack Aware Consumer
>
>
>
> *CAUTION:* The e-mail below is from an external source. Please exercise
> caution before opening attachments, clicking links, or following guidance.
>
> Ok I made a request to access this document
>
> Thanks
>
>
> Mich Talebzadeh,
>
> Dad | Technologist | Solutions Architect | Engineer
>
> London
>
> United Kingdom
>
>
>
>  [image: Image removed by sender.]  view my Linkedin profile
> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>
> Ent
>
>
>
>
>
>  https://en.everybodywiki.com/Mich_Talebzadeh
>
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
>
>
>
>
> On Fri, 26 Jan 2024 at 15:48, Schwager, Randall <
> randall.schwa...@charter.com> wrote:
>
> Hi Mich,
>
>
>
> Thanks for responding. In the JIRA issue, the design doc you’re referring
> to describes the prior work.
>
>
>
> This is the design doc for the proposed change:
> https://docs.google.com/document/d/1RoEk_mt8AUh9sTQZ1NfzIuuYKf1zx6BP1K3IlJ2b8iM/edit#heading=h.pbt6pdb2jt5c
>
>
>
> I’ll re-word the description to make that distinction more clear.
>
>
>
> Sincerely,
>
>
>
> Randall
>
>
>
> *From: *Mich Talebzadeh <mich.talebza...@gmail.com>
> *Date: *Friday, January 26, 2024 at 04:30
> *To: *"Schwager, Randall" <randall.schwa...@charter.com>
> *Cc: *"dev@spark.apache.org" <dev@spark.apache.org>
> *Subject: *[EXTERNAL] Re: Spark Kafka Rack Aware Consumer
>
>
>
> *CAUTION:* The e-mail below is from an external source. Please exercise
> caution before opening attachments, clicking links, or following guidance.
>
> Your design doc
>
> Structured Streaming Kafka Source - Design Doc - Google Docs
> <https://docs.google.com/document/d/19t2rWe51x7tq2e5AOfrsM9qb8_m7BRuv9fel9i0PqR8/edit#heading=h.k36c6oyz89xw>
>
>
>
> seems to be around since 2016. Reading the comments  it was decided not to
> progress with it. What has changed since then please?
>
>
>
> Are you implying if this  doc is still relevant?
>
>
>
> HTH
>
>
>
>
> Mich Talebzadeh,
>
> Dad | Technologist | Solutions Architect | Engineer
>
> London
>
> United Kingdom
>
>
>
>  *Error! Filename not specified.*  view my Linkedin profile
> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>
>
>
>  https://en.everybodywiki.com/Mich_Talebzadeh
>
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
>
>
>
>
> On Thu, 25 Jan 2024 at 20:10, Schwager, Randall <
> randall.schwa...@charter.com> wrote:
>
> Bump.
>
> Am I asking these questions in the wrong place? Or should I forego design
> input and just write the PR?
>
>
>
> *From: *"Schwager, Randall" <randall.schwa...@charter.com>
> *Date: *Monday, January 22, 2024 at 17:02
> *To: *"dev@spark.apache.org" <dev@spark.apache.org>
> *Subject: *Re: Spark Kafka Rack Aware Consumer
>
>
>
> Hello Spark Devs!
>
>
>
> After doing some detective work, I’d like to revisit this idea in earnest.
> My understanding now is that setting `client.rack` dynamically on the
> executor will do nothing. This is because the driver assigns Kafka
> partitions to executors. I’ve summarized a design to enable rack awareness
> and other location assignment patterns more generally in SPARK-46798
> <https://issues.apache.org/jira/browse/SPARK-46798>.
>
>
>
> Since this is my first go at contributing to Spark, could I ask for a
> committer to help shepherd this JIRA issue along?
>
>
>
> Sincerely,
>
>
>
> Randall
>
>
>
> *From: *"Schwager, Randall" <randall.schwa...@charter.com>
> *Date: *Wednesday, January 10, 2024 at 19:39
> *To: *"dev@spark.apache.org" <dev@spark.apache.org>
> *Subject: *Spark Kafka Rack Aware Consumer
>
>
>
> Hello Spark Devs!
>
>
>
> Has there been discussion around adding the ability to dynamically set the
> ‘client.rack’ Kafka parameter at the executor?
>
> The Kafka SQL connector code on master doesn’t seem to support this
> feature. One can easily set the ‘client.rack’ parameter at the driver, but
> that just sets all executors to have the same rack. It seems that if we
> want each executor to set the correct rack, each executor will have to
> produce the setting dynamically on start-up.
>
>
>
> Would this be a good area to consider contributing new functionality?
>
>
>
> Sincerely,
>
>
>
> Randall
>
>
>
>

Re: [EXTERNAL] Re: Spark Kafka Rack Aware Consumer

Reply via email to