[
https://issues.apache.org/jira/browse/SPARK-50294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dongjoon Hyun updated SPARK-50294:
----------------------------------
Fix Version/s: 4.0.0
> Refactor docker image for testing
> ---------------------------------
>
> Key: SPARK-50294
> URL: https://issues.apache.org/jira/browse/SPARK-50294
> Project: Spark
> Issue Type: Umbrella
> Components: Project Infra
> Affects Versions: 4.0.0
> Reporter: Ruifeng Zheng
> Assignee: Ruifeng Zheng
> Priority: Major
> Fix For: 4.0.0
>
>
> currently we only have single testing image ({_}dev/infra/Dockerfile{_}), for
> jobs {{{}pyspark{}}}, {{{}sparkr{}}}, {{lint}} and {{{}docs{}}}, it has two
> major issues:
> * {*}disk space limitation{*}: we are adding more and more packages in it,
> the disk space left for testing is very limited, and cause {{No space left on
> device}} from time to time;
> * {*}environment conflicts{*}: for example, even though we already install
> some packages for {{docs}} in the docker file, we still need to install some
> additional python packages in {{{}build_and_test{}}}, due to the conflicts
> between {{docs}} and {{{}pyspark{}}}. It is hard to maintain because the
> related packages are installed in different places.
>
> so we want to split existing base image to multiple ones, so that:
> * completely cache all the dependencies for each job;
> * centralize related installations for each job;
> * free up disk space on the base image;
> * introduce new dev tools based on new images;
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]