imbajin opened a new issue, #723:
URL: https://github.com/apache/hugegraph-toolchain/issues/723

   Observed while rerunning `loader-ci` during PR #716 review.
   
   ## Problem
   
   The `Prepare env and service` step in `loader-ci` appears to spend a large 
amount of time repeatedly downloading or rebuilding external dependencies on 
each run, even when the versions do not change.
   
   From the current workflow:
   
   - `.github/workflows/loader-ci.yml` only caches `~/.m2`
   - `hugegraph-loader/assembly/travis/install-hadoop.sh` always downloads 
`hadoop-2.8.5.tar.gz` from `archive.apache.org`
   - `hugegraph-loader/assembly/travis/install-mysql.sh` always runs `docker 
pull mysql:5.7`
   - `hugegraph-loader/assembly/travis/install-hugegraph-from-source.sh` always 
clones `apache/hugegraph` and rebuilds the server package from source
   
   The screenshot from the failing/re-run workflow shows `Prepare env and 
service` taking about 19 minutes, with a large Hadoop tarball download 
dominating the step.
   
   ```text
   loader-ci
   └─ Prepare env and service
      ├─ install-hadoop.sh
      │  └─ wget hadoop-2.8.5.tar.gz  (large tarball, repeated)
      ├─ install-mysql.sh
      │  └─ docker pull mysql:5.7     (repeated image pull)
      └─ install-hugegraph-from-source.sh
         └─ git clone + mvn package   (repeated source build)
   ```
   
   ## Why this matters
   
   - CI duration is much longer than necessary
   - CI becomes more fragile because it depends on multiple external downloads 
during the test phase
   - Re-runs are expensive even when the code change is unrelated to loader 
integration environments
   - Current cache coverage likely does not match the real bottlenecks
   
   ## Suggested directions
   
   ### Prefer official artifacts / containers over ad-hoc install scripts
   
   - Replace the MySQL setup script with a GitHub Actions `services` container 
or another pinned official image
   - Replace the Hadoop local install script with a pinned container/image or 
other official prebuilt artifact if possible
   - For HugeGraph server, prefer a reusable prebuilt tarball/artifact for the 
pinned commit/version instead of cloning and packaging from source on every CI 
run
   
   ### If scripts must remain, make them cache-aware and idempotent
   
   - Add cache coverage for downloaded tarballs or extracted runtime 
directories if we still use script-based setup
   - Skip `wget` / `docker pull` / clone+build when the required artifact is 
already available
   - Make the scripts check for existing files/directories before 
re-downloading or rebuilding
   - Verify whether GitHub Actions cache is currently missing the relevant 
paths, or whether restore keys are ineffective for this use case
   
   ## Possible scope
   
   - `.github/workflows/loader-ci.yml`
   - `hugegraph-loader/assembly/travis/install-hadoop.sh`
   - `hugegraph-loader/assembly/travis/install-mysql.sh`
   - `hugegraph-loader/assembly/travis/install-hugegraph-from-source.sh`
   
   ## Expected outcome
   
   - Repeated `loader-ci` runs should not re-download the same Hadoop tarball 
every time
   - MySQL setup should rely on a reusable/pinned container path rather than 
always pulling inside the script
   - HugeGraph server setup should reuse a stable artifact or cacheable output 
where possible
   - `Prepare env and service` time should drop significantly and become more 
stable
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to