[
https://issues.apache.org/jira/browse/TIKA-2170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15645282#comment-15645282
]
Tim Allison edited comment on TIKA-2170 at 11/7/16 8:18 PM:
------------------------------------------------------------
I'm able to reproduce this problem. The ForkServer is shutting down with exit
value 0, which is a good thing.
I think the problem is that the ForkServer shuts down if it hasn't received or
sent any data in 5 seconds.
{noformat}
public void run() {
try {
while (active) {
active = false;
Thread.sleep(5000);
}
System.exit(0);
} catch (InterruptedException e) {
}
}
{noformat}
When I remove the call to sleep() in your example code, I'm not able to
reproduce the problem.
Even without your call to sleep, though, if a parser takes > 5 seconds to do
something...let's say the parser slurps the entire input stream and then spends
a long time parsing it before writing any output, then the ForkServer will
shutdown.
We could parameterize the amount of sleep before
shutting-down-on-no-stream-activity if that would help.
was (Author: [email protected]):
I'm able to reproduce this problem. The ForkServer is shutting down with exit
value 0, which is a good thing.
I think the problem is that the ForkServer shuts down if it hasn't received or
sent any data in 5 seconds.
{noformat}
public void run() {
try {
while (active) {
active = false;
Thread.sleep(5000);
}
System.exit(0);
} catch (InterruptedException e) {
}
}
{noformat}
When I remove the call to sleep() in your example code, I'm not able to
reproduce the problem.
Even without your call to sleep, though, if a parser takes > 5 seconds to do
something...let's say the parser slurps the entire input stream and then spends
a long time parsing it before writing any output, then the ForkParser will
shutdown.
We could parameterize the amount of sleep before
shutting-down-on-no-stream-activity if that would help.
> Tika 1.13 ForkParser fails intermittently with very large MS Word docx
> ----------------------------------------------------------------------
>
> Key: TIKA-2170
> URL: https://issues.apache.org/jira/browse/TIKA-2170
> Project: Tika
> Issue Type: Bug
> Components: parser
> Affects Versions: 1.13
> Environment: Windows 10
> Reporter: Tim Kingsbury
> Attachments: TikaForkParserExample.java, War and Peace.docx
>
>
> If the ForkParser is run in a for-loop over and over against a single large
> Microsoft Word DOCX file, it fails intermittently. Sometimes it will fail on
> the very first iteration. Sometimes it will run through several iterations
> before failing. Results are inconsistent.
> A small test application is enclosed. For the test, I use a Word docx with
> the full text of "War and Peace". 2.8MB, 1141 pages of text.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)