This is an automated email from the ASF dual-hosted git repository.
gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/master by this push:
new 2a2608ad557 [SPARK-39697][INFRA] Add REFRESH_DATE flag and use
previous cache to build cache image
2a2608ad557 is described below
commit 2a2608ad557e3ebb160287b7d7fd9d14c251b3c2
Author: Yikun Jiang <[email protected]>
AuthorDate: Thu Jul 7 08:59:38 2022 +0900
[SPARK-39697][INFRA] Add REFRESH_DATE flag and use previous cache to build
cache image
### What changes were proposed in this pull request?
This patch have two improvment:
- Add `cache-from`: this will help to speed up cache build and ensure the
image will NOT do full refresh if `REFRESH_DATE` is not changed by intention.
- Add `FULL_REFRESH_DATE` in dockerfile: this will help force to do a full
refresh.
### Why are the changes needed?
Without this PR, if you change the dockerfile, the cache image will do a
**complete refreshed** when dockerfile with any changes. This cause the
different behavoir between ci tmp image (cache based refresh, in
pyspark/sparkr/lint job) and infra cache (full refresh, in build infra cache
job).
Finally, if a PR refresh dockerfile, you might see pyspark/sparkr/lint CI
is successful, but next pyspark/sparkr/lint CI failure after cache is refreshed
(because deps may be changed when image do full refreshed).
After this PR, if you change the dockerfile, the cache image job will do a
cache based refreshed (use previous cache as much as possible, and refreshed
the left layers when cache mismatch) to keep same behavior of
pyspark/sparkr/lint job result.
This behavior is similar to **static image** in some level, you can refresh
the `FULL_REFRESH_DATE` to force refresh cache completely, the advantage is you
can see the pyspark/sparkr/lint ci results in GA when you do full refresh.
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
Test local
Closes #37103 from Yikun/SPARK-39522-FOLLOWUP.
Authored-by: Yikun Jiang <[email protected]>
Signed-off-by: Hyukjin Kwon <[email protected]>
---
.github/workflows/build_infra_images_cache.yml | 1 +
dev/infra/Dockerfile | 2 ++
2 files changed, 3 insertions(+)
diff --git a/.github/workflows/build_infra_images_cache.yml
b/.github/workflows/build_infra_images_cache.yml
index 4ab27da7bdf..145769d1506 100644
--- a/.github/workflows/build_infra_images_cache.yml
+++ b/.github/workflows/build_infra_images_cache.yml
@@ -57,6 +57,7 @@ jobs:
context: ./dev/infra/
push: true
tags:
ghcr.io/apache/spark/apache-spark-github-action-image-cache:${{ github.ref_name
}}
+ cache-from:
type=registry,ref=ghcr.io/apache/spark/apache-spark-github-action-image-cache:${{
github.ref_name }}
cache-to:
type=registry,ref=ghcr.io/apache/spark/apache-spark-github-action-image-cache:${{
github.ref_name }},mode=max
-
name: Image digest
diff --git a/dev/infra/Dockerfile b/dev/infra/Dockerfile
index 8968b097251..e3ba4f6110b 100644
--- a/dev/infra/Dockerfile
+++ b/dev/infra/Dockerfile
@@ -18,6 +18,8 @@
# Image for building and testing Spark branches. Based on Ubuntu 20.04.
FROM ubuntu:20.04
+ENV FULL_REFRESH_DATE 20220706
+
ENV DEBIAN_FRONTEND noninteractive
ENV DEBCONF_NONINTERACTIVE_SEEN true
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]