[ https://issues.apache.org/jira/browse/TIKA-416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12862690#action_12862690 ]
Chris A. Mattmann commented on TIKA-416: ---------------------------------------- +1, this sounds like a great idea! We did some work on this in OODT in terms of simple external met extractors and so forth. Maybe we could follow a similar approach here. Check out: http://svn.apache.org/repos/asf/incubator/oodt/cas-metadata/trunk/src/main/java/gov/nasa/jpl/oodt/cas/metadata/extractors/ExternMetExtractor.java and http://svn.apache.org/repos/asf/incubator/oodt/cas-metadata/trunk/src/main/resources/examples/extern-config.xml as some examples of how to deal with this (NOTE, in OODT-3, we are still in the process of converting over the licenses and there are no "official" incubator releases of OODT yet, but I just wanted to let you know about it as some pointers to ways to get this done). You rock and I can't wait for this feature! > Out-of-process text extraction > ------------------------------ > > Key: TIKA-416 > URL: https://issues.apache.org/jira/browse/TIKA-416 > Project: Tika > Issue Type: New Feature > Components: parser > Reporter: Jukka Zitting > Priority: Minor > > There's currently no easy way to guard against JVM crashes or excessive > memory or CPU use caused by parsing very large, broken or intentionally > malicious input documents. To better protect against such cases and to > generally improve the manageability of resource consumption by Tika it would > be great if we had a way to run Tika parsers in separate JVM processes. This > could be handled either as a separate "Tika parser daemon" or as an > explicitly managed pool of forked JVMs. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.