[jira] [Commented] (JENA-1862) Query.cloneQuery is slow

Claus Stadler (Jira) Wed, 18 Mar 2020 10:23:25 -0700


    [ 
https://issues.apache.org/jira/browse/JENA-1862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17061928#comment-17061928
 ]


Claus Stadler commented on JENA-1862:
-------------------------------------

Hi Andy, I have implemented the described approach and created a test case that 
piped all *.rq that were parsable under arq/testing through it

https://github.com/apache/jena/compare/master...Aklakan:JENA-1862#diff-f94bfaf468aaab2ba094bf845abaf4a3R76

The status is, that there are 1234 queries in total, but 6 caused errors and 9 
result in deviation from the original clone method - so the vast majority is 
working.
On first glance, most deviations are simply due to base IRIs getting expanded 
with the current working directory - not sure how much plumbing it would take 
to disable it in the current syntax transform code :/

The errors seem to be for queries such as 
./jena-arq/testing/DAWG-Final/syntax-sparql2/syntax-form-describe01.rq which 
only contains "DESCRIBE <u>"

{code}
java.lang.NullPointerException
        at 
org.apache.jena.sparql.serializer.FormatterElement.visit(FormatterElement.java:310)
        at 
org.apache.jena.sparql.syntax.ElementGroup.visit(ElementGroup.java:120)
        at 
org.apache.jena.sparql.serializer.FormatterElement.visitAsGroup(FormatterElement.java:442)
        at 
org.apache.jena.sparql.serializer.QuerySerializer.visitQueryPattern(QuerySerializer.java:207)
        at org.apache.jena.query.Query.visit(Query.java:756)
        at org.apache.jena.query.Query.serialize(Query.java:914)
        at org.apache.jena.query.Query.serialize(Query.java:891)
        at org.apache.jena.query.Query.serialize(Query.java:881)
        at org.apache.jena.query.Query.serialize(Query.java:842)
        at org.apache.jena.query.Query.toString(Query.java:787)
{code}

I also already fixed a glitch in my code where named graphs are incorrectly 
added as default graphs:  
https://github.com/apache/jena/blob/master/jena-arq/src/main/java/org/apache/jena/sparql/syntax/syntaxtransform/QueryTransformOps.java#L169


> Query.cloneQuery is slow
> ------------------------
>
>                 Key: JENA-1862
>                 URL: https://issues.apache.org/jira/browse/JENA-1862
>             Project: Apache Jena
>          Issue Type: Improvement
>          Components: ARQ
>    Affects Versions: Jena 3.14.0
>            Reporter: Claus Stadler
>            Priority: Major
>
> This is a follow up to JENA-1861 which is about the thread safety of Query 
> objects. As sharing a Query object across threads may introduce race 
> conditions, the obvious workaround would be to just clone it. However, the 
> current implementation serializes the query to string and then re-parses it 
> which causes a very high overhead (for my use case it became the top entry in 
> the VisualVM profiler).
> Proposal:
>  * Extend ElementTransformCopyBase with a 'alwaysCopy' flag analogous to 
> ExprTransformCopy
>  * Add a new constructor to 
> ExprTransformApplyElementTransform(ElementTransform transform, boolean 
> alwaysCopy) so that the alwayCopy flag of the underlying ExprTransformCopy 
> can be set to true
>  * Implement clone using syntatic transforms as below
> {code:java}
> public static Query fastClone(Query query) {
>     ElementTransform eltXform = new ElementTransformCopyBase2(true);
>     ExprTransform exprXform = new 
> ExprTransformApplyElementTransform2(eltXform, true);
>     Query result = QueryTransformOps.transform(query, eltXform, exprXform);
>     return result;
> }
> {code}
> This approach 'works-for-me' and I can create a pull request for this, but 
> maybe there are more subtleties to the outlined approach that need to be 
> considered?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (JENA-1862) Query.cloneQuery is slow

Reply via email to