In answer to this part of your question

"..*Understanding the Issue:* Are there known reasons within Spark that
could explain this difference in behavior when loading dependencies via
`--packages` versus placing JARs directly?
*2. "*

--jar Adds only that jar
--package adds the Jar and a its dependencies listed in maven

*HTH*

Mich Talebzadeh,
Technologist | Architect | Data Engineer  | Generative AI | FinCrime
London
United Kingdom


   view my Linkedin profile
<https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>


 https://en.everybodywiki.com/Mich_Talebzadeh



*Disclaimer:* The information provided is correct to the best of my
knowledge but of course cannot be guaranteed . It is essential to note
that, as with any advice, quote "one test result is worth one-thousand
expert opinions (Werner  <https://en.wikipedia.org/wiki/Wernher_von_Braun>Von
Braun <https://en.wikipedia.org/wiki/Wernher_von_Braun>)".


On Sat, 4 May 2024 at 12:24, Damien Hawes <marley.ha...@gmail.com> wrote:

> Hi folks,
>
> I'm contributing to the OpenLineage project, specifically the Apache Spark
> integration. My current focus is on extending the project to support data
> lineage extraction for Spark Streaming, beginning with Apache Kafka sources
> and sinks.
>
> I've encountered an obstacle when attempting to access information
> essential for lineage extraction from Apache Kafka-related classes within
> the OpenLineage Spark code base. Specifically, I need to access details
> like Kafka topic names and bootstrap servers from objects like
> StreamingDataSourceV2Relation.
>
> While I can successfully access these details if the Kafka JARs are placed
> directly in the 'spark/jars' directory, I'm unable to do so when using the
> `--packages` option for dependency management. This creates a significant
> obstacle for users who rely on `--packages` for their Spark applications.
>
> I've taken initial steps to investigate (viewable in this GitHub PR
> <https://github.com/OpenLineage/OpenLineage/pull/2647>, the class in
> question is *StreamingDataSourceV2RelationVisitor*), but I'd greatly
> appreciate any insights or guidance on the following:
>
> *1. Understanding the Issue:* Are there known reasons within Spark that
> could explain this difference in behavior when loading dependencies via
> `--packages` versus placing JARs directly?
> *2. Alternative Approaches:*  Are there recommended techniques or
> patterns to access the necessary Kafka class information within a
> SparkListener extension, especially when dependencies are managed via
> `--packages`?
>
> I'm eager to find a solution that avoids heavy reliance on reflection.
>
> Thank you for your time and assistance!
>
> Kind regards,
> Damien
>
>

Reply via email to