I recently upgraded httpcomponents-core from apache-rat 0.12 to 0.17 and
have seen an increase in RatCheckMojo runtime from 0.911 seconds to 7.448
seconds, as can be seen by comparing these Develocity build reports:

- Before:
https://scans.gradle.com/s/kaqbflny4crsc/timeline?toggled=WyIzMiJd&view=by-type
- After:
https://scans.gradle.com/s/qzx3nwz6iyn2g/timeline?toggled=WyIxMCJd&view=by-type

I tried `1.0.0-SNAPSHOT` but it was only about one second faster. I used
async-profiler to grab a quick flame graph for 1.0.0-SNAPSHOT and I see a
tremendous amount of time being spent in Tika charset detection (*not* MIME
type detection -- it specifically looks like charset detection), along with
lots of regex matching time of course (lots of which is for copyright
scanning). Is there anything I can do on my end to speed this up? Is anyone
working on parallelizing processing (RAT-340)? Can charset detection be
optimized somehow?

Reply via email to