andygrove opened a new issue, #4671:
URL: https://github.com/apache/datafusion-comet/issues/4671

   ## Describe the bug
   
   The `apache-rat-plugin` is bound to the Maven `verify` phase (pom.xml around 
line 1118), so it runs during every `install` invocation, including the 6 
`./mvnw ... -DskipTests install` runs in `dev/release/build-release-comet.sh`. 
(`-DskipTests` skips tests, not RAT.)
   
   RAT scans the root module's directory tree. The exclude list covers 
`**/target/**`, `**/build/**`, `.git/**`, etc., but does NOT exclude several 
untracked generated/scratch directories that accumulate during a release:
   
   - `dev/release/venv/**` (Python virtualenv, thousands of files)
   - `dev/release/comet-rm/workdir/**` (docker build working dir)
   - `dev/dist/**` (extracted release tarballs plus multi-MB `.tar.gz`)
   - `dev/release/rat.txt`, `dev/release/filtered_rat.txt`, 
`dev/release/apache-rat-*.jar`
   
   During a release build these directories are populated, so each RAT pass 
walks a very large number of files and the build appears to hang. Because RAT 
runs in-process inside the Maven JVM, there is no separate `apache-rat` process 
visible in `ps`, which makes it look like the build is stuck rather than busy.
   
   ## To Reproduce
   
   Run a release build (`dev/release/build-release-comet.sh`) on a tree where 
`dev/release/venv` and `dev/dist` are populated, and observe the RAT step 
during the `mvnw install` runs.
   
   ## Expected behavior
   
   RAT should skip generated/scratch directories that never contain source 
requiring license headers.
   
   ## Proposed fix
   
   Add excludes to the `apache-rat-plugin` configuration in `pom.xml`:
   
   ```xml
   <exclude>dev/release/venv/**</exclude>
   <exclude>dev/release/comet-rm/workdir/**</exclude>
   <exclude>dev/dist/**</exclude>
   <exclude>dev/release/rat.txt</exclude>
   <exclude>dev/release/filtered_rat.txt</exclude>
   <exclude>dev/release/*.jar</exclude>
   ```
   
   This should land on `main` and be cherry-picked to release branches as 
needed.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to