[dom4j-user] Parsing large files

Peter Venø Fri, 16 Dec 2005 00:59:08 -0800

Helle all,

I've have just joined the list, so forgive me if the questions have been asked before.

I'm parsing large XML files using DOM4J's event system

private void addEntryHandler(SAXReader saxReader) {
    saxReader.addHandler( "/root/entry",
         new ElementHandler() {
             public void onStart(ElementPath path) {}
             public void onEnd(ElementPath path) {
                 Element entry = path.getCurrent();
                 processEntry(entry);
                 entry.detach();
             }
         }
     );
    }

Which I believe is the standard way. The processEntry(entry) method extracts info by means of

entry.element("name").getText()

Now this works well but the time uses to parse the records increases linearly as the parsing progresses. The memory consumptions of the parser increase, but only slightly. I can parse the first 10,000 records in appr. 10 seconds whereas parsing entries 2,160,000 to 2,170,000 takes more that 5 minutes.

According to this article http://www.devx.com/Java/Article/29161/0/page/2 parsing of 'extremely large files' should not be a problem. However my file is significantly larger that the extremely large file used in the article (14 MB). It ludicrously large - appr. 850 MB - gzipped

Have any of you experienced similar problems with the parsing of large files. Any input is appriciated.

thanks

Peter

[dom4j-user] Parsing large files

Reply via email to