Are you running in YARN mode and you want to put these jar files into HDFS
in a distributed cluster?
HTH

Dr Mich Talebzadeh,
Architect | Data Science | Financial Crime | Forensic Analysis | GDPR

   view my Linkedin profile
<https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>





On Sat, 31 May 2025 at 19:47, Nimrod Ofek <ofek.nim...@gmail.com> wrote:

> Hi everyone,
>
> Apologies if this is a basic question—I’ve searched around but haven’t
> found a clear answer.
>
> I'm currently developing a Spark application using Scala, and I’m looking
> for a way to include all the JARs typically bundled in a standard Spark
> installation as a single provided dependency.
>
> From what I’ve seen, most examples add each Spark module individually
> (like spark-core, spark-sql, spark-mllib, etc.) as separate provided 
> dependencies.
> However, since these are all included in the Spark runtime environment, I’m
> wondering why there isn’t a more aggregated dependency—something like a
> parent project or BOM (Bill of Materials) that pulls in all the commonly
> included Spark libraries (along with compatible versions of Log4j, Guava,
> Jackson, and so on) - which is being used in projects.
>
> Is there a particular reason this approach isn’t commonly used? Does it
> cause issues with transitive dependencies or version mismatches? If so -
> I'm sure those can be addressed as well...
>
>
> Thanks in advance for any insights!
>
>
> Best regards,
>
> Nimrod
>
>

Reply via email to