Helle
all,
I've have just
joined the list, so forgive me if the questions have been asked before.
I'm parsing large
XML files using DOM4J's event system
private void
addEntryHandler(SAXReader saxReader) {
saxReader.addHandler( "/root/entry",
new ElementHandler() {
public void onStart(ElementPath path) {}
public void onEnd(ElementPath path) {
Element entry = path.getCurrent();
processEntry(entry);
entry.detach();
}
}
);
}
saxReader.addHandler( "/root/entry",
new ElementHandler() {
public void onStart(ElementPath path) {}
public void onEnd(ElementPath path) {
Element entry = path.getCurrent();
processEntry(entry);
entry.detach();
}
}
);
}
Which I believe is
the standard way. The processEntry(entry) method extracts info by means of
entry.element("name").getText()
Now this works well
but the time uses to parse the records increases linearly as the
parsing progresses. The memory consumptions of the parser increase,
but only slightly. I can parse the first 10,000 records in appr. 10 seconds
whereas parsing entries 2,160,000 to 2,170,000 takes more that 5
minutes.
According to this
article http://www.devx.com/Java/Article/29161/0/page/2 parsing
of 'extremely large files' should not be a problem. However my file is
significantly larger that the extremely large file used in the
article (14 MB). It ludicrously large - appr. 850 MB -
gzipped
Have any of you
experienced similar problems with the parsing of large files. Any input is
appriciated.
thanks
Peter