yes, that is something worth thinking about .... thanks for bringing this up... ----- Original Message ----- From: "Michael Wechner" <michael.wech...@wyona.com> To: java-user@lucene.apache.org Sent: Friday, May 22, 2009 11:41:51 AM GMT -08:00 US/Canada Pacific Subject: Re: Parsing large xml files
crack...@comcast.net schrieb: > once you get comfortable with vtd-xml, few people will ever get back to DOM > and SAX... > maybe you want to consider to contribute a vtd-xml based parsing implementation to Lucene ;-) Thanks Michael > ----- Original Message ----- > From: "Sithu D. Sudarsan" <sithu.sudar...@fda.hhs.gov> > To: java-user@lucene.apache.org > Sent: Friday, May 22, 2009 6:39:33 AM GMT -08:00 US/Canada Pacific > Subject: RE: Parsing large xml files > > Thanks everyone for your useful suggestions/links. > > Lucene uses DOM and we tried with SAX. > > XML Pull & vtd-xml as well as Piccolo seem good. > > However, for now, we've broken the file into smaller chunks and then > parsing it. > > When we get some time, we'ld like to refactor with the suggested ones. > > Erick: We do use Eclipse. But running from CLI gives the same error! May > be there is a way to address the memory issues, but the current idea of > breaking into smaller chunks have worked for now... > > > Sincerely, > Sithu D Sudarsan > > -----Original Message----- > From: Michael Wechner [mailto:michael.wech...@wyona.com] > Sent: Friday, May 22, 2009 4:48 AM > To: java-user@lucene.apache.org > Subject: Re: Parsing large xml files > > crack...@comcast.net schrieb: > >> http://vtd-xml.sf.net >> >> >> ----- Original Message ----- >> From: "Sithu D. Sudarsan" <sithu.sudar...@fda.hhs.gov> >> To: java-user@lucene.apache.org >> Sent: Thursday, May 21, 2009 7:42:59 AM GMT -08:00 US/Canada Pacific >> Subject: Parsing large xml files >> >> >> Hi, >> >> While trying to parse xml documents of about 50MB size, we run into >> OutOfMemoryError due to java heap space. Increasing JVM to use close >> > 2GB > >> (that is the max), does not help. Is there any API that could be used >> > to > >> handle such large single xml files? >> >> > > I am not familiar with that particular code of Lucene, but is it > possible that Lucene is using DOM for this parsing? > If so, one could try to replace it by SAX, and hence get rid of the > OutOfMemory issue. > > Cheers > > Michael > >> If Lucene is not the right place, please let me know alternate places >> > to > >> look for, >> >> Thanks in advance, >> Sithu D Sudarsan >> sithu.sudar...@fda.hhs.gov >> sdsudar...@ualr.edu >> >> >> >> >> >> > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org