Hi everyone,
I updated my project to latest tika trunk recently (since it's got things I 
need) and quickly noticed that my performance tests show a rather big 
regression.
I tracked it down to new code introduced in this changeset:
http://www.mail-archive.com/[email protected]/msg00081.html

Parsing all those SimpleDateFormat strings takes a *long* time. getTimeZone and 
setTimeZone also show up on profile.

In my testing scenario (lots of simple files; multithreading that makes re-use 
of Metadata objects hard, etc), Metadata.<init> takes about 1/3 of all Tika 
time, rivaling guessContent and actual parsing in profiler.

>From a quick glance, it seems like all DateFormat creation could be static, or 
>otherwise created up front. Is this correct?

Thanks
Radek

Reply via email to