I've been reading through some of the emails referenced, and it looks like the problem might be in the code on the client side.
In one of the emails from May 2013, the client-side code tries to write the entire file to Tika, and then to read the extracted text back. I had a similar problem with some files, and discovered that, for certain files, Tika started to write back extracted text before the entire file had been written. At some point, a deadlock situation arose where each side was waiting for the other to read what had been written to the socket. I solved this by running the read part on the client side in a separate thread. This appears to work fine – I have seen no strange hangs even after feeding close to a million files in sizes up to 100MB through a single Tika process.
