All,
While integrating 2.0.0 trunk into Tika and running against govdocs1, I'm
finding two issues that are difficult to reproduce.
Background:
Tika-batch has a parent process that kicks off a Tika processor in a child
process, if that dies unexpectedly, the parent kicks it off again. I'm running
with 10 consumer/parser threads and -Xmx5g on an (8 cpu/8GB vm); RHEL 7, Linux
cloud-server-02 3.10.0-123.20.1.el7.x86_64 #1 SMP Wed Jan 21 09:45:55 EST 2015
x86_64 x86_64 x86_64 GNU/Linux)
Two problems:
1) The child process exits with value 1. I'm catching Throwable around the
primary execution call in the child process and logging it; nothing shows up in
the log files from that part of the code. From the parser log files (at trace),
I can tell which 10 files were being processed at the time, but I'm not seeing
any other information about what caused the exit. When I run against just
those 10 files, all is ok.
2) The OS is killing the child far more often than it does with 1.8.9
(exit code 137).
For the second problem, I'll wait until the optimizations to the caching are
completed before I start worrying about that. However, do you have any
recommendations on how to figure out what's going on with 1)?
Thank you!
Cheers,
Tim