lewismc opened a new pull request, #879:
URL: https://github.com/apache/nutch/pull/879

   This is a big PR which revisits the long sought-after Ant + Ivy replacement 
with Gradle. I integrated a lot of the work we did a number of years back and 
also demystified the Gradle implementation for plugins which stopped this task 
in its track previously. 
   I've tried to maintain parity between the Ant target names and Gradle task 
names so the build feels the same. I updated the README with some new guidance 
as well. I've updated the GitHub Action so hopefully this can be tested more 
thoroughly. 
   I do want to thank the previous contributors to this task as well. They did 
a fantastic job and really did the bulk of the work.
   Thanks for any review.
   
   # Gradle Build Benchmark Results
   
   **Date:** December 27, 2025  
   **System:** macOS (darwin 25.2.0)  
   **Gradle Version:** 8.5  
   **Java Version:** 11+
   
   ## Build Times
   
   | Benchmark | Command | Time | Notes |
   |-----------|---------|------|-------|
   | **Cold build** | `./gradlew clean runtime --no-daemon` | **13.1s** | First 
build, no caches, no daemon |
   | **Warm build** | `./gradlew clean runtime` | **10.6s** | Daemon running, 
caches populated |
   | **Incremental build** | `./gradlew runtime` (1 file changed) | **1.1s** | 
Only recompiles affected code |
   | **No-op build** | `./gradlew runtime` (nothing changed) | **0.9s** | Just 
checks up-to-date status |
   
   ## Artifact Sizes
   
   | Artifact | Size | Description |
   |----------|------|-------------|
   | `apache-nutch-1.22-SNAPSHOT.jar` | **889 KB** | Core Nutch classes |
   | `apache-nutch-1.22-SNAPSHOT.job` | **305 MB** | Hadoop job JAR with all 
dependencies |
   | `runtime/local/` | **420 MB** | Full local runtime directory |
   
   ## Task Execution Summary
   
   - **318 total tasks** in the build graph
   - **241 tasks executed** on clean build
   - **76 tasks from cache** (Gradle build cache)
   - **Parallel execution** enabled (`org.gradle.parallel=true`)
   
   ## Gradle Features Utilized
   
   | Feature | Status | Benefit |
   |---------|--------|---------|
   | Incremental compilation | ✅ Enabled | Only recompiles changed files |
   | Build cache | ✅ Enabled | Reuses outputs from previous builds |
   | Parallel execution | ✅ Enabled | Builds independent tasks concurrently |
   | Gradle Daemon | ✅ Enabled | Keeps JVM warm between builds |
   | Up-to-date checking | ✅ Smart | Skips tasks when inputs unchanged |
   
   ## Comparison with Ant Build
   
   ### Build Time Comparison
   
   | Benchmark | Ant | Gradle | Improvement |
   |-----------|-----|--------|-------------|
   | **Cold build** (`clean runtime`) | 20.6s | 13.1s | **36% faster** |
   | **Incremental build** (1 file changed) | 10.0s | 1.1s | **89% faster** |
   | **No-op build** (nothing changed) | 3.8s | 0.9s | **76% faster** |
   
   ### Artifact Size Comparison
   
   | Artifact | Ant | Gradle | Difference |
   |----------|-----|--------|------------|
   | Core JAR | 842 KB | 889 KB | +6% |
   | Job JAR | 292 MB | 305 MB | +4% |
   | `runtime/local/` | 355 MB | 420 MB | +18% |
   
   ### Analysis
   
   **Build Performance:**
   - Gradle's **incremental compilation** provides the biggest win — rebuilding 
after a single file change is **9x faster** than Ant
   - **Cold builds** are 36% faster due to parallel task execution and 
optimized dependency resolution
   - **No-op builds** benefit from Gradle's smart up-to-date checking (0.9s vs 
3.8s)
   
   **Artifact Sizes:**
   - Gradle produces slightly larger artifacts due to different dependency 
resolution
   - The Job JAR is 4% larger but uses the more efficient nested JAR format (vs 
unpacked classes in Ant)
   - Runtime directory is larger due to additional transitive dependencies 
being included
   
   **Developer Experience:**
   - Gradle Daemon keeps JVM warm between builds, reducing startup overhead
   - Build cache allows reusing outputs across clean builds
   - Parallel execution utilizes multiple CPU cores effectively
   
   ## How to Reproduce
   
   ```bash
   # Stop any running daemon
   ./gradlew --stop
   
   # Cold build (no daemon)
   time ./gradlew clean runtime --no-daemon
   
   # Warm build (with daemon)
   time ./gradlew clean runtime
   
   # Incremental build (touch a file, rebuild)
   touch src/java/org/apache/nutch/crawl/CrawlDb.java
   time ./gradlew runtime
   
   # No-op build
   time ./gradlew runtime
   
   # Check artifact sizes
   ls -lh build/*.jar build/*.job
   du -sh runtime/local/
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to