Creating a custom classloader to load classes from those jars? El jue, 17 oct 2024, 19:47, Nimrod Ofek <ofek.nim...@gmail.com> escribió:
> > Hi, > > Thanks all for the replies. > > I am adding the Spark dev list as well - as I think this might be an issue > that needs to be addressed. > > The options presented here will get the jars - but they don't help us with > dependencies conflicts... > For example - com.google.cloud.bigdataoss:gcs-connector:hadoop3-2.2.0 - > uses Guava 30 while Spark 3.5.3 uses Guava 14 - the options here will > result with both conflicting. > > How can one add packages to their Spark (during the build process of the > Docker image) - without causing unresolved conflicts? > > Thanks! > Nimrod > > > On Tue, Oct 15, 2024 at 6:53 PM Damien Hawes <marley.ha...@gmail.com> > wrote: > >> Herewith a more fleshed out example: >> >> An example of a *build.gradle.kts* file: >> >> plugins { >> id("java") >> } >> >> val sparkJarsDir = >> objects.directoryProperty().convention(layout.buildDirectory.dir("sparkJars")) >> >> repositories { >> mavenCentral() >> } >> >> val sparkJars: Configuration by configurations.creating { >> isCanBeResolved = true >> isCanBeConsumed = false >> } >> >> dependencies { >> sparkJars("com.fasterxml.jackson.core:jackson-databind:2.18.0") >> } >> >> val copySparkJars by tasks.registering(Copy::class) { >> group = "build" >> description = "Copies the appropriate jars to the configured spark jars >> directory" >> from(sparkJars) >> into(sparkJarsDir) >> } >> >> Now, the *Dockerfile*: >> >> FROM spark:3.5.3-scala2.12-java17-ubuntu >> >> USER root >> >> COPY --chown=spark:spark build/sparkJars/* "$SPARK_HOME/jars/" >> >> USER spark >> >> >> Kind regards, >> >> Damien >> >> On Tue, Oct 15, 2024 at 4:19 PM Damien Hawes <marley.ha...@gmail.com> >> wrote: >> >>> The simplest solution that I have found in solving this was to use >>> Gradle (or Maven, if you prefer), and list the dependencies that I want >>> copied to $SPARK_HOME/jars as project dependencies. >>> >>> Summary of steps to follow: >>> >>> 1. Using your favourite build tool, declare a dependency on your >>> required packages. >>> 2. Write your Dockerfile, with or without the Spark binaries inside it. >>> 3. Using your build tool to copy the dependencies to a location that the >>> Docker daemon can access. >>> 4. Copy the dependencies into the correct directory. >>> 5. Ensure those files have the correct permissions. >>> >>> In my opinion, it is pretty easy to do this with Gradle. >>> >>> Op di 15 okt. 2024 15:28 schreef Nimrod Ofek <ofek.nim...@gmail.com>: >>> >>>> Hi all, >>>> >>>> I am creating a base Spark image that we are using internally. >>>> We need to add some packages to the base image: >>>> spark:3.5.1-scala2.12-java17-python3-r-ubuntu >>>> >>>> Of course I do not want to Start Spark with --packages "..." - as it is >>>> not efficient at all - I would like to add the needed jars to the image. >>>> >>>> Ideally, I would have add to my image something that will add the >>>> needed packages - something like: >>>> >>>> RUN $SPARK_HOME/bin/add-packages "..." >>>> >>>> But AFAIK there is no such option. >>>> >>>> Other than running Spark to add those packages and then creating the >>>> image - or running Spark always with --packages "..." - what can I do? >>>> Is there a way to run just the code that is run by the --package >>>> command - without running Spark, so I can add the needed dependencies to my >>>> image? >>>> >>>> I am sure this is something that I am not the only one nor the first >>>> one to encounter... >>>> >>>> Thanks! >>>> Nimrod >>>> >>>> >>>> >>>