crack...@comcast.net schrieb:
once you get comfortable with vtd-xml, few people will ever get back to DOM and SAX...

maybe you want to consider to contribute a vtd-xml based parsing implementation to Lucene ;-)

Thanks

Michael
----- Original Message ----- From: "Sithu D. Sudarsan" <sithu.sudar...@fda.hhs.gov> To: java-user@lucene.apache.org Sent: Friday, May 22, 2009 6:39:33 AM GMT -08:00 US/Canada Pacific Subject: RE: Parsing large xml files Thanks everyone for your useful suggestions/links. Lucene uses DOM and we tried with SAX. XML Pull & vtd-xml as well as Piccolo seem good. However, for now, we've broken the file into smaller chunks and then parsing it. When we get some time, we'ld like to refactor with the suggested ones. Erick: We do use Eclipse. But running from CLI gives the same error! May be there is a way to address the memory issues, but the current idea of breaking into smaller chunks have worked for now...

Sincerely, Sithu D Sudarsan -----Original Message----- From: Michael Wechner [mailto:michael.wech...@wyona.com] Sent: Friday, May 22, 2009 4:48 AM To: java-user@lucene.apache.org Subject: Re: Parsing large xml files crack...@comcast.net schrieb:
http://vtd-xml.sf.net

----- Original Message ----- From: "Sithu D. Sudarsan" <sithu.sudar...@fda.hhs.gov> To: java-user@lucene.apache.org Sent: Thursday, May 21, 2009 7:42:59 AM GMT -08:00 US/Canada Pacific Subject: Parsing large xml files

Hi, While trying to parse xml documents of about 50MB size, we run into OutOfMemoryError due to java heap space. Increasing JVM to use close
2GB
(that is the max), does not help. Is there any API that could be used
to
handle such large single xml files?

I am not familiar with that particular code of Lucene, but is it possible that Lucene is using DOM for this parsing? If so, one could try to replace it by SAX, and hence get rid of the OutOfMemory issue. Cheers Michael
If Lucene is not the right place, please let me know alternate places
to
look for, Thanks in advance, Sithu D Sudarsan sithu.sudar...@fda.hhs.gov sdsudar...@ualr.edu





--------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org

--------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to