arne-bdt opened a new issue, #2740:
URL: https://github.com/apache/jena/issues/2740
### Version
5.2.0-SNAPSHOT
### Feature
Profiling shows that resolving IRIs takes a lot of time when parsing
RDF/XML.
(Parsers: RRX.RDFXML_SAX, RRX.RDFXML_StAX_ev, RRX.RDFXML_StAX_sr )
There were two main things, I could do about it:
- In all current RRX-parsers, many IRIs are parsed twice, when "private Node
iriResolve(String uriStr, ..." is called.
--> I added "public Node createURI(IRIx iriX, ...);" to the
ParserProfile, which simply uses the given IRI instead of resolving it again.
--> in some cases this made the reader 25% faster
- adding general IRIx caching
`(org.apache.jena.atlas.lib.cache.CacheSimple`) in the parsers where the
already cached org.apache.jena.riot.system.ParserProfileStd#resolver is not
applicable
--> in some cases this gave me another 10%
### Are you interested in contributing a solution yourself?
Yes
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]