Aklakan commented on issue #1470:
URL: https://github.com/apache/jena/issues/1470#issuecomment-1207204881

   Adding a cache to IRIx should be simple and I can check how much this 
improves.
   
   How does the iri4ld implementation differ from jena's current default one 
functionality-wise? 
   Having less (needless) synchronization between threads is always better.
   
   > FYI: https://github.com/tarql/tarql/pull/99 upgrades tarql to Apache Jena 
4.5.0
   Good to know that its possible to compare performance of spark-based tarql 
to original tarql within jena4! :)
   Especially because then the same IRI machinery is used.
   
   
   I noticed that E_BNode also causes waits due to synchronization in a 
SecureRandom instance.
   My spark job's runtime (using a test mapping without iri()) jumps from ~4.5 
to ~10 seconds only by adding a dummy bnode() call:
   ```sparql
   CONSTRUCT { <urn:example:s> <urn:example:p> ?a, ?b, ?c } # ... 16 columns in 
total
   FROM <file:data.csv>
   WHERE { BIND(bnode(?a) AS ?foobar) }
   ```
   
   The same job with tarql/jena2 executes somewhere between 50-60 sec where 
with bnode it seems to tend more towards 60sec - so the effect is less visible. 
It seems that threads competing for the bnode call is also a bottleneck.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to