Dave Meikle, Tom and All,
How many of us are using Tika in Docker? If so, how exactly are you using
it? Single instance, swarm, Kubernetes, something else? People fear I/O hit
with tika-server...what are your experiences?
I really like the ability to limit the number of CPUs in the Docker container.
If a single doc causes multithreaded gc to go nuts, that won't kill an entire
machine. This also cleanly limits the risk from XXE or arbitrary code
execution, right?
If this is one of the ways of the future for big data, we might want to look
into hardening tika-server (OOMs, timeouts). What do you all think?
Cheers,
Tim
Timothy B. Allison, Ph.D.
Principal Artificial Intelligence Engineer
Group Lead
K83E/Human Language Technology
The MITRE Corporation
7515 Colshire Drive, McLean, VA 22102
703-983-2473 (phone); 703-983-1379 (fax)