[GitHub] [jena] Aklakan opened a new issue, #1470: Concurrent read with jena-iri Parser

GitBox Fri, 05 Aug 2022 13:15:11 -0700


Aklakan opened a new issue, #1470:
URL: https://github.com/apache/jena/issues/1470


   ### Version
   
   4.6.0-SNAPSHOT
   
   ### What happened?
   
   I started again looking into the issues I had with Jena in Spark settings; 
related to https://issues.apache.org/jira/browse/JENA-2309
   
   Right now I am investigating some long standing performance issues 
concurrent processing time does not scale directly with the number of cores. 
Concretely, I am comparing our spark+jena4-based tarql re-implementation with 
original tarql (jena2).
   
   One culprit is the jena-iri package which uses synchronized singleton lexers 
which introduce locking overhead between the parsing threads. Making those 
lexers thread-local reduces the overhead. On my notebook in power save and 
performance mode I get these improvements:
   
   jena-plain
   power save: 68 sec
   performance: 21 sec
   
   thread-local-fix:
   power save: 54 sec
   performance: 19sec
   
   Profiler output:
   
![image](https://user-images.githubusercontent.com/839608/183150747-28f926e4-8d3b-4f7d-973a-8680043b7277.png)
   
   
   A related issue I am currently investigating is that a lot of time is spent 
in the IRI parsing machinery e.g. via E_IRI. For testing I changed it to return 
the argument as given which reduced the total processing time (in performance 
mode) from 19 to 13 seconds - so around 30% - time that is predominantly spent 
in the jena-iri lexers. I am not yet sure however if there is anything that can 
be optimized without compromising functionality though. The workaround could as 
well be to use a custom function that performs fewer checks.
   
   
   ### Relevant output and stacktrace
   
   _No response_
   
   ### Are you interested in making a pull request?
   
   Yes


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [jena] Aklakan opened a new issue, #1470: Concurrent read with jena-iri Parser

Reply via email to