On 07/04/11 17:59, Brian McBride wrote:
G'Day,

When doing some profiling of Jena on another matter, I noticed that the
IRI code seemed to consume a lot of time when parsing RDF/XML. On
further investigation, a minor modification to
com.hp.hpl.jena.iri.impl.AbsLexer results in a 15% speedup in IRI
creation and a 10% speedup in parsing RDF/XML, in my tests (YMMV).


How:

In com.hp.hpl.jena.iri.impl.AbsLexer

Replace
[[
final protected void rule(int rule) {
parser.matchedRule(range,rule,yytext());
}
]]

with

[[
private static final boolean DEBUG = false;
final protected void rule(int rule) {
if (DEBUG) {
parser.matchedRule(range,rule,yytext());
}
}
]]

Explanation:

This is debug code. As far as I can see yytext() returns a copy of part
of the input buffer as a stream. Parser.matchedRule prints some debug
information if DEBUG is true, which it is not.

The method rule(int) is called rather a lot, hence compiling out this
code in AbsLexer results in noticeable performance improvement.

I have run a test creating 10 million IRIs fo the form
"http://www.example.com/foo/bar/bas#nnnnn";. On my machine I see approx
15% performance improvement.

I have taken the test graphs from the test in [1] and read them into a
memory model. This runs about 10% faster with the mod I am suggesting
than without the mod.

com.hp.hpl.jena.iri.test.TestPackage green lines.

I have attached a patch file rooted in the IRI project.

Brian

[1]

https://jena.svn.sourceforge.net/svnroot/jena/TDB/trunk/src-dev/reports/ReportOutOfMemoryManyGraphsTDB.java

I think this is in auto-generated code from fragment.jflex.
If so, the change will get lost if it isn't made there.

Would it be better to tie it to Parser.DEBUG (and make that final) so one setting affects both?

        Andy

Reply via email to