tika-user  

Re: AutoDetectParser not thread-safe?

Jukka Zitting
Tue, 26 Jan 2010 10:25:04 -0800

Hi,

On Tue, Jan 26, 2010 at 6:50 PM, Adam Rauch <a...@labkey.com> wrote:
> We are using Tika 0.5 to parse files that are added to a Lucene index.  If
> we assign multiple threads to the parsing task we find that the
> AutoDetectParser.parse() method will occasionally fail to return.  In our
> case, it appears that a HashMap inside Xerces gets corrupted, causing an
> infinite loop inside HashMap.get().  This seems to be a concurrency problem;
> we have not seen the issue when running single threaded.

Hmm, that's indeed quite troublesome.

> I can open a JIRA issue if you’d prefer.

That would be great. Thanks to your in-depth analysis of the problem
it should be easy to come up with a fix.

BR,

Jukka Zitting