Is there a branch associated with the 0.18 release? On Sun, Jan 18, 2026 at 4:00 AM P. Ottlinger <[email protected]> wrote:
> Hi Ryan, > > Am 17.01.26 um 04:52 schrieb Ryan Schmitt: > >> I tried `1.0.0-SNAPSHOT` but it was only about one second faster. I used > >> async-profiler to grab a quick flame graph for 1.0.0-SNAPSHOT and I see > a > >> tremendous amount of time being spent in Tika charset detection (*not* > MIME > >> type detection -- it specifically looks like charset detection), along > with > >> lots of regex matching time of course (lots of which is for copyright > >> scanning). Is there anything I can do on my end to speed this up? Is > anyone > >> working on parallelizing processing (RAT-340)? Can charset detection be > >> optimized somehow? > thanks for diving into the regression - at the moment we are working on > a 0.18 bugfix release and in the background we are changing the whole > module structure/architecture of RAT (will become 1.0.0). > > As we introduced Tika to detect files we cannot gauge if there's a > problem withinin Tika or the way RAT uses Tika to detect charsets and > MIME-types. > > Feel free to create PRs via Github for little improvements you see :) > > Cheers, > Phil > >
