I am having similar problem but indexing pdf documents using pdfbox parser (available 
at www.pdfbox.com). I get an exception saying "Exception in thread "main" 
java.lang.OutOfMemoryError" Any body who has implemented the above code? Any help 
appreciated???
Thanks!
PI
 Rob Outar <[EMAIL PROTECTED]> wrote:We are aware of DOM limitations/memory 
problems, but I am using SAX to parse
the file and index elements and attributes in my content handler.

Thanks,

Rob

-----Original Message-----
From: Tatu Saloranta [mailto:[EMAIL PROTECTED]]
Sent: Friday, February 14, 2003 8:18 PM
To: Lucene Users List
Subject: Re: OutOfMemoryException while Indexing an XML file


On Friday 14 February 2003 07:27, Aaron Galea wrote:
> I had this problem when using xerces to parse xml documents. The problem I
> think lies in the Java garbage collector. The way I solved it was to
create

It's unlikely that GC is the culprit. Current ones are good at purging
objects
that are unreachable, and only throw OutOfMem exception when they really
have
no other choice.
Usually it's the app that has some dangling references to objects that
prevent
GC from collecting objects not useful any more.

However, it's good to note that Xerces (and DOM parsers in general)
generally
use more memory than the input XML files they process; this because they
usually have to keep the whole document struct in memory, and there is
overhead on top of text segments. So it's likely to be at least 2 * input
file size (files usually use UTF-8 which most of the time uses 1 byte per
char; in memory 16-bit unicode-2 chars are used for performance), plus some
additional overhead for storing element structure information and all that.

And since default max. java heap size is 64 megs, big XML files can cause
problems.

More likely however is that references to already processed DOM trees are
not
nulled in a loop or something like that? Especially if doing one JVM process
for item solves the problem.

> a shell script that invokes a java program for each xml file that adds it
> to the index.

-+ Tatu +-


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



---------------------------------
Do you Yahoo!?
Yahoo! Shopping - Send Flowers for Valentine's Day

Reply via email to