[ 
https://issues.apache.org/jira/browse/TIKA-591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14342232#comment-14342232
 ] 

Luis Filipe Nassif commented on TIKA-591:
-----------------------------------------

I think this is very important. We are having problems on Linux that I think 
are related to this while running the TesseractOCRParser. Sometimes the trace 
is similar to those posted in HADOOP-5059, sometimes it is outside of 
TesseractOCRParser, but I think it is related to a memory corruption caused by 
an early fork/exec. Reducing the max heap of the JVM helps a bit, but does not 
solve the issue. I don't know the tika-batch code, is it possible to use 
CompositeParser directly with tika-batch?

> Separate launcer process for forking JVMs
> -----------------------------------------
>
>                 Key: TIKA-591
>                 URL: https://issues.apache.org/jira/browse/TIKA-591
>             Project: Tika
>          Issue Type: Improvement
>          Components: parser
>            Reporter: Jukka Zitting
>            Assignee: Jukka Zitting
>            Priority: Minor
>
> As a followup to TIKA-416, it would be good to implement at least optional 
> support for a separate launcher process for the ForkParser feature. The need 
> for such an extra process came up in JCR-2864 where a reference to 
> http://developers.sun.com/solaris/articles/subprocess/subprocess.html  was 
> made.
> To summarize, the problem is that the ProcessBuilder.start() call can result 
> in a temporary duplication of the memory space of the parent JVM. Even with 
> copy-on-write semantics this can be a fairly expensive operation and prone to 
> out-of-memory issues especially in large-scale deployments where the parent 
> JVM already uses the majority of the available RAM on a computer.
> A similar problem is also being discussed at HADOOP-5059.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to