assignUser commented on code in PR #13453:
URL: https://github.com/apache/arrow/pull/13453#discussion_r909484781


##########
.github/workflows/r_nightly.yml:
##########
@@ -100,18 +149,37 @@ jobs:
           # strip superfluous nested dirs
           new_paths <- sub(art_path, ".", new_paths)
           dirs <- dirname(new_paths)
-          dir_result <- sapply(dirs, dir.create, recursive = TRUE)
+          sapply(dirs, dir.create, recursive = TRUE, showWarnings = FALSE)
 
-          if (!all(dir_result)) {
-            stop("There was an issue while creating the folders!")
-          }
-
-          copy_result <- file.copy(current_path, new_paths)
+          # overwrite allows us to "force push" a new version with the same 
name
+          copy_result <- file.copy(current_path, new_paths, overwrite = TRUE)
 
           if (!all(copy_result)) {
             stop("There was an issue while copying the files!")
           }
+      - name: Prune Repository
+        shell: bash
+        env:
+          KEEP: ${{ github.event.inputs.keep || 14 }}
+        run: |   
+          prune() {
+            # list files  | retain $KEEP newest files | delete everything else
+            ls -t $1/arrow* | tail -n +$((KEEP + 1)) | xargs --no-run-if-empty 
rm 
+          }
+
+          # find leaf sub dirs
+          repo_dirs=$(find repo -type d -links 2)
 
+          # We want to retain $keep (14) versions of each pkg/lib so we call
+          # prune on each leaf dir and not on repo/.
+          for dir in ${repo_dirs[@]}; do
+            prune $dir
+          done

Review Comment:
   I'll give it a try :)



##########
.github/workflows/r_nightly.yml:
##########
@@ -59,6 +65,12 @@ jobs:
       - name: Install Archery
         shell: bash
         run: pip install -e arrow/dev/archery[all]
+      - uses: actions/cache@v3
+        with:
+          path: repo
+          key: r-nightly-${{ github.run_id }}
+          restore-keys: r-nightly-
+      

Review Comment:
   🤦‍♂️ you are right it is duplicated



##########
.github/workflows/r_nightly.yml:
##########
@@ -77,10 +89,47 @@ jobs:
             echo "No files found. Stopping upload."
             exit 1
           fi
+      - name: Cache Repo
+        uses: actions/cache@v3
+        with:
+          path: repo
+          key: r-nightly-${{ github.run_id }}
+          restore-keys: r-nightly-
+      - name: Sync from Remote

Review Comment:
   GHA caches are evicted regularly especially due to the workaround I 
described above which creates a large number of redundant cache objects (this 
is also in use for the R cpp builds). So we can not rely on always hitting the 
cache.



##########
.github/workflows/r_nightly.yml:
##########
@@ -59,6 +65,12 @@ jobs:
       - name: Install Archery
         shell: bash
         run: pip install -e arrow/dev/archery[all]
+      - uses: actions/cache@v3
+        with:
+          path: repo
+          key: r-nightly-${{ github.run_id }}
+          restore-keys: r-nightly-
+      

Review Comment:
   No, this is intentional. `actions/cache` is broken, it will not update the 
cache on a primary key (there are several issues about this e.g. 
https://github.com/actions/cache/discussions/641). This is a (wasteful) 
workaround, by generating a unique primary key and matching on the 
`restore-key` we force the cache to be "updated" (a new cache object will be 
created). Leading the whole concept of a cache ad absurdum but there is now way 
around it for now :(



##########
.github/workflows/r_nightly.yml:
##########
@@ -124,14 +192,32 @@ jobs:
             tools::write_PACKAGES(dir, type = ifelse(on_win, "win.binary", 
"mac.binary"), verbose = TRUE )
           }
       - name: Show repo contents
-        run: ls -R repo
-      - name: Upload Files
-        uses: burnett01/[email protected]
-        with:
-          switches: -avzr
-          path: repo/*
-          remote_path: ${{ secrets.NIGHTLIES_RSYNC_PATH }}/arrow/r
-          remote_host: ${{ secrets.NIGHTLIES_RSYNC_HOST }}
-          remote_port: ${{ secrets.NIGHTLIES_RSYNC_PORT }}
-          remote_user: ${{ secrets.NIGHTLIES_RSYNC_USER }}
-          remote_key: ${{ secrets.NIGHTLIES_RSYNC_KEY }}
+        run: tree repo
+      - name: Sync to Remote

Review Comment:
   For functional reasons:
   - the action does not support downloading from source
     - this could be achived without rsync but would likely be slower (no auth)
   - the functionality can be implemented in 10 LOCs without the need for 
docker, which in itself is a valid reason not to use the action imho
   - but also security reasons:
     - this workflow runs with full access to `secrets` and a ~~r/w 
`github.token` available to actions~~ (I removed this) -> :warning: see 
[here](https://cwiki.apache.org/confluence/display/BUILDS/GitHub+Actions+status#GitHubActionsstatus-Security)
     - the action does not validate host keys -> potential for MITM
   



##########
.github/workflows/r_nightly.yml:
##########
@@ -100,18 +149,37 @@ jobs:
           # strip superfluous nested dirs
           new_paths <- sub(art_path, ".", new_paths)
           dirs <- dirname(new_paths)
-          dir_result <- sapply(dirs, dir.create, recursive = TRUE)
+          sapply(dirs, dir.create, recursive = TRUE, showWarnings = FALSE)
 
-          if (!all(dir_result)) {
-            stop("There was an issue while creating the folders!")
-          }
-
-          copy_result <- file.copy(current_path, new_paths)
+          # overwrite allows us to "force push" a new version with the same 
name
+          copy_result <- file.copy(current_path, new_paths, overwrite = TRUE)
 
           if (!all(copy_result)) {
             stop("There was an issue while copying the files!")
           }
+      - name: Prune Repository
+        shell: bash
+        env:
+          KEEP: ${{ github.event.inputs.keep || 14 }}
+        run: |   
+          prune() {
+            # list files  | retain $KEEP newest files | delete everything else
+            ls -t $1/arrow* | tail -n +$((KEEP + 1)) | xargs --no-run-if-empty 
rm 
+          }
+
+          # find leaf sub dirs
+          repo_dirs=$(find repo -type d -links 2)
 
+          # We want to retain $keep (14) versions of each pkg/lib so we call
+          # prune on each leaf dir and not on repo/.
+          for dir in ${repo_dirs[@]}; do
+            prune $dir
+          done

Review Comment:
   No we can't, because this will delete by time with no regard to total number 
of files, so if for some reason the nightly builds are not uploaded regularly 
(e.g. because the build is failing for some reason) this will delete all files.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to