Y, sorry, Tilman. I'm not running into problems with 1.8.9 and straight text extraction, though.
Following Timo's recommendation...looks like a memory issue. Let me know if I should post the full file or move to a more recent version of Java. :) # There is insufficient memory for the Java Runtime Environment to continue. # Native memory allocation (malloc) failed to allocate 403177472 bytes for committing reserved memory. # Possible reasons: # The system is out of physical RAM or swap space ... # Out of Memory Error (os_linux.cpp:2798), pid=14958, tid=140419564971776 ... vm_info: OpenJDK 64-Bit Server VM (24.75-b04) for linux-amd64 JRE (1.7.0_75-b13), built on Jan 16 2015 09:15:47 by "mockbuild" with gcc 4.8.2 20140120 (Red Hat 4.8.2-16) -----Original Message----- From: Tilman Hausherr [mailto:[email protected]] Sent: Monday, July 20, 2015 1:28 PM To: [email protected] Subject: Re: help debugging integration of PDFBox 2.0.0 trunk Am 20.07.2015 um 18:12 schrieb Allison, Timothy B.: > All, > While integrating 2.0.0 trunk into Tika and running against govdocs1, I'm > finding two issues that are difficult to reproduce. > > Background: > Tika-batch has a parent process that kicks off a Tika processor in a child > process, if that dies unexpectedly, the parent kicks it off again. I'm > running with 10 consumer/parser threads and -Xmx5g on an (8 cpu/8GB vm); RHEL > 7, Linux cloud-server-02 3.10.0-123.20.1.el7.x86_64 #1 SMP Wed Jan 21 > 09:45:55 EST 2015 x86_64 x86_64 x86_64 GNU/Linux) > > Two problems: > > 1) The child process exits with value 1. I'm catching Throwable around > the primary execution call in the child process and logging it; nothing shows > up in the log files from that part of the code. From the parser log files (at > trace), I can tell which 10 files were being processed at the time, but I'm > not seeing any other information about what caused the exit. When I run > against just those 10 files, all is ok. > > 2) The OS is killing the child far more often than it does with 1.8.9 > (exit code 137). > > For the second problem, I'll wait until the optimizations to the caching are > completed before I start worrying about that. However, do you have any > recommendations on how to figure out what's going on with 1)? I'm also having some problem with that system... with my test software, I have observed that java uses more and more space, despite it being told not to use more than a certain amount with -Xmx. After some time, the "process killer" kills the application. Seems something changed in java memory management: http://karunsubramanian.com/websphere/one-important-change-in-memory-management-in-java-8/ I did some investigation on this a few months ago, but gave up out of frustration. Tilman --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
