[ 
https://issues.apache.org/jira/browse/SPARK-50294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-50294:
----------------------------------
    Fix Version/s: 4.0.0

> Refactor docker image for testing
> ---------------------------------
>
>                 Key: SPARK-50294
>                 URL: https://issues.apache.org/jira/browse/SPARK-50294
>             Project: Spark
>          Issue Type: Umbrella
>          Components: Project Infra
>    Affects Versions: 4.0.0
>            Reporter: Ruifeng Zheng
>            Assignee: Ruifeng Zheng
>            Priority: Major
>             Fix For: 4.0.0
>
>
> currently we only have single testing image ({_}dev/infra/Dockerfile{_}), for 
> jobs {{{}pyspark{}}}, {{{}sparkr{}}}, {{lint}} and {{{}docs{}}}, it has two 
> major issues:
>  * {*}disk space limitation{*}: we are adding more and more packages in it, 
> the disk space left for testing is very limited, and cause {{No space left on 
> device}} from time to time;
>  * {*}environment conflicts{*}: for example, even though we already install 
> some packages for {{docs}} in the docker file, we still need to install some 
> additional python packages in {{{}build_and_test{}}}, due to the conflicts 
> between {{docs}} and {{{}pyspark{}}}. It is hard to maintain because the 
> related packages are installed in different places.
>  
> so we want to split existing base image to multiple ones, so that:
>  * completely cache all the dependencies for each job;
>  * centralize related installations for each job;
>  * free up disk space on the base image;
>  * introduce new dev tools based on new images;



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to