Re: Spark Docker image with added packages

Ángel Thu, 17 Oct 2024 10:52:52 -0700

Creating a custom classloader to load classes from those jars?

El jue, 17 oct 2024, 19:47, Nimrod Ofek <ofek.nim...@gmail.com> escribió:


>
> Hi,
>
> Thanks all for the replies.
>
> I am adding the Spark dev list as well - as I think this might be an issue
> that needs to be addressed.
>
> The options presented here will get the jars - but they don't help us with
> dependencies conflicts...
> For example - com.google.cloud.bigdataoss:gcs-connector:hadoop3-2.2.0 -
> uses Guava 30 while Spark 3.5.3 uses Guava 14 - the options here will
> result with both conflicting.
>
> How can one add packages to their Spark (during the build process of the
> Docker image) - without causing unresolved conflicts?
>
> Thanks!
> Nimrod
>
>
> On Tue, Oct 15, 2024 at 6:53 PM Damien Hawes <marley.ha...@gmail.com>
> wrote:
>
>> Herewith a more fleshed out example:
>>
>> An example of a *build.gradle.kts* file:
>>
>> plugins {
>>     id("java")
>> }
>>
>> val sparkJarsDir = 
>> objects.directoryProperty().convention(layout.buildDirectory.dir("sparkJars"))
>>
>> repositories {
>>     mavenCentral()
>> }
>>
>> val sparkJars: Configuration by configurations.creating {
>>     isCanBeResolved = true
>>     isCanBeConsumed = false
>> }
>>
>> dependencies {
>>     sparkJars("com.fasterxml.jackson.core:jackson-databind:2.18.0")
>> }
>>
>> val copySparkJars by tasks.registering(Copy::class) {
>>     group = "build"
>>     description = "Copies the appropriate jars to the configured spark jars 
>> directory"
>>     from(sparkJars)
>>     into(sparkJarsDir)
>> }
>>
>> Now, the *Dockerfile*:
>>
>> FROM spark:3.5.3-scala2.12-java17-ubuntu
>>
>> USER root
>>
>> COPY --chown=spark:spark build/sparkJars/* "$SPARK_HOME/jars/"
>>
>> USER spark
>>
>>
>> Kind regards,
>>
>> Damien
>>
>> On Tue, Oct 15, 2024 at 4:19 PM Damien Hawes <marley.ha...@gmail.com>
>> wrote:
>>
>>> The simplest solution that I have found in solving this was to use
>>> Gradle (or Maven, if you prefer), and list the dependencies that I want
>>> copied to $SPARK_HOME/jars as project dependencies.
>>>
>>> Summary of steps to follow:
>>>
>>> 1. Using your favourite build tool, declare a dependency on your
>>> required packages.
>>> 2. Write your Dockerfile, with or without the Spark binaries inside it.
>>> 3. Using your build tool to copy the dependencies to a location that the
>>> Docker daemon can access.
>>> 4. Copy the dependencies into the correct directory.
>>> 5. Ensure those files have the correct permissions.
>>>
>>> In my opinion, it is pretty easy to do this with Gradle.
>>>
>>> Op di 15 okt. 2024 15:28 schreef Nimrod Ofek <ofek.nim...@gmail.com>:
>>>
>>>> Hi all,
>>>>
>>>> I am creating a base Spark image that we are using internally.
>>>> We need to add some packages to the base image:
>>>> spark:3.5.1-scala2.12-java17-python3-r-ubuntu
>>>>
>>>> Of course I do not want to Start Spark with --packages "..." - as it is
>>>> not efficient at all - I would like to add the needed jars to the image.
>>>>
>>>> Ideally, I would have add to my image something that will add the
>>>> needed packages - something like:
>>>>
>>>> RUN $SPARK_HOME/bin/add-packages "..."
>>>>
>>>> But AFAIK there is no such option.
>>>>
>>>> Other than running Spark to add those packages and then creating the
>>>> image - or running Spark always with --packages "..."  - what can I do?
>>>> Is there a way to run just the code that is run by the --package
>>>> command - without running Spark, so I can add the needed dependencies to my
>>>> image?
>>>>
>>>> I am sure this is something that I am not the only one nor the first
>>>> one to encounter...
>>>>
>>>> Thanks!
>>>> Nimrod
>>>>
>>>>
>>>>
>>>

Re: Spark Docker image with added packages

Reply via email to