Aklakan opened a new issue, #1470: URL: https://github.com/apache/jena/issues/1470
### Version 4.6.0-SNAPSHOT ### What happened? I started again looking into the issues I had with Jena in Spark settings; related to https://issues.apache.org/jira/browse/JENA-2309 Right now I am investigating some long standing performance issues concurrent processing time does not scale directly with the number of cores. Concretely, I am comparing our spark+jena4-based tarql re-implementation with original tarql (jena2). One culprit is the jena-iri package which uses synchronized singleton lexers which introduce locking overhead between the parsing threads. Making those lexers thread-local reduces the overhead. On my notebook in power save and performance mode I get these improvements: jena-plain power save: 68 sec performance: 21 sec thread-local-fix: power save: 54 sec performance: 19sec Profiler output:  A related issue I am currently investigating is that a lot of time is spent in the IRI parsing machinery e.g. via E_IRI. For testing I changed it to return the argument as given which reduced the total processing time (in performance mode) from 19 to 13 seconds - so around 30% - time that is predominantly spent in the jena-iri lexers. I am not yet sure however if there is anything that can be optimized without compromising functionality though. The workaround could as well be to use a custom function that performs fewer checks. ### Relevant output and stacktrace _No response_ ### Are you interested in making a pull request? Yes -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
