I recently upgraded httpcomponents-core from apache-rat 0.12 to 0.17 and have seen an increase in RatCheckMojo runtime from 0.911 seconds to 7.448 seconds, as can be seen by comparing these Develocity build reports:
- Before: https://scans.gradle.com/s/kaqbflny4crsc/timeline?toggled=WyIzMiJd&view=by-type - After: https://scans.gradle.com/s/qzx3nwz6iyn2g/timeline?toggled=WyIxMCJd&view=by-type I tried `1.0.0-SNAPSHOT` but it was only about one second faster. I used async-profiler to grab a quick flame graph for 1.0.0-SNAPSHOT and I see a tremendous amount of time being spent in Tika charset detection (*not* MIME type detection -- it specifically looks like charset detection), along with lots of regex matching time of course (lots of which is for copyright scanning). Is there anything I can do on my end to speed this up? Is anyone working on parallelizing processing (RAT-340)? Can charset detection be optimized somehow?
