[ 
https://issues.apache.org/jira/browse/JENA-712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14028979#comment-14028979
 ] 

Rob Vesse commented on JENA-712:
--------------------------------

[~harschware] What Andy is getting at is that the specs for URIs include a 
whole section about how a URI processor decides on what the Base URI is.  In 
certain circumstances the Base URI is well defined whereas in others it is up 
to the processor.  He was also making the point that for some services it is 
possible to get a different Base URI depending on how the request is made to 
the service because a {{GET}} and a {{POST}} would have different Base URIs 
because of the presence of query string in a {{GET}} versus a {{POST}}

[~andy.seaborne] I agree with your comment:

bq. would have though that updates (or queries but maybe less so) that behaved 
unexpectedly would be something we'd have heard about at some point. This is 
not a new area of code.

Given this the option to maintain the default might be the safer since we don't 
know who is relying on the current behaviour and they may not even be aware of 
the behaviour.  I suspect most people are probably sane enough to avoid 
relative URIs entirely because they lead to all sorts of interesting problems 
like this because of their inherent instability.

Explaining our use case in slightly more detail might make more sense for why 
we use relative URIs.  We have a test harness which is a wrapper around the 
SPARQL Query Benchmarker framework which generates test data of a desired size 
relative to the location where it is running.  The generated data is used by 
predefined test suites that contain queries and updates which are using 
relative URIs because they don't know exactly where the harness will be 
running.  Thus the desired behaviour is that the URIs get appropriately 
resolved relative to the harness and then remain absolute after that which as 
described in this bug turns out not to be the case because of how ARQ treats 
the implicit base.

I know you have concerns about adding additional options but if ARQ is supposed 
to be extensible and general purpose SPARQL processor as possible we should 
provide appropriate options to tweak behaviour where necessary even if most 
users never need to know/care they exist.  In this case the change to allow the 
configurability is minimal if we go with the 
{{JENA-712-ConfigurableOutputImplicitBase}} patch - adding the configurability 
and maintain the current behaviour as the default - so I don't see it creating 
too much harm.

It is worth nothing that we have found a couple of workarounds for this:

1. Call {{setBaseURI((String)null)}} on the query/update thus causing the 
{{Prologue}} to have the explicit base flag set and then write out the {{BASE}} 
as seen in my original code attachment
1. Add a {{BASE <>}} declaration to the start of queries/updates which has the 
same effect as the above

> ARQ serialises queries and updates using relative URIs but does not include a 
> BASE clause
> -----------------------------------------------------------------------------------------
>
>                 Key: JENA-712
>                 URL: https://issues.apache.org/jira/browse/JENA-712
>             Project: Apache Jena
>          Issue Type: Bug
>          Components: ARQ
>    Affects Versions: Jena 2.11.2
>            Reporter: Rob Vesse
>         Attachments: JENA-712-AlwaysWriteBase.patch, 
> JENA-712-ConfigurableOutputImplictBase.patch, 
> JENA-712-ConfigurableOutputImplictBaseOnByDefault.patch, 
> SparqlRelativeUriTreatment.java
>
>
> An internal discussion with [~harschware] has raised what we think is a bug 
> in ARQs behaviour though it is somewhat open to interpretation so may be 
> controversial.
> The code I will attach demonstrates the issue.
> The problem arises as follows:
> 1 - When given a query/update with a relative URI ARQ resolves it against an 
> implicit Base URI of the current working directory
> 2 - When applying {{toString()}} on the parsed {{Query}} or {{UpdateRequest}} 
> the implicit Base URI is used and relative URIs are output but no `BASE` 
> clause is output
> 3 - The query is transmitted to a different system which has a different 
> working directory and so interprets it differently resulting in unexpected 
> behaviour/errors
> This causes us issues because the relative URIs are valid relative to the 
> working directory of the client but not relative to the working directory of 
> the server so we want absolute URIs to be transmitted to the server.
> For example given the following query string:
> {noformat}
> SELECT * WHERE { <path/to/thing> a ?type }
> {noformat}
> Calling {{toString()}} on the resulting {{Query}} object gives the following:
> {noformat}
> SELECT  *
> WHERE
>   { <path/to/thing> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> ?type }
> {noformat}
> Which does not include the `BASE` declaration, if we however force the 
> `Query` object to have a null base via `setBaseURI((String)null)` ARQ prints 
> the following when `toString()` is called:
> {noformat}
> BASE    <file:///Users/rvesse/Documents/Work/Code/jena-playground/>
> SELECT  *
> WHERE
>   { <path/to/thing> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> ?type }
> {noformat}
> More generally it seems that whenever an implicit Base URI is used or where a 
> Base URI is passed only to the {{QueryFactory.create()}} or 
> {{UpdateFactory.create()}} call a {{BASE}} declaration is never written i.e. 
> when there is an {{IRIResolver}} set but not a specific Base URI no {{BASE}} 
> declaration will be written but URIs will be serialised in relative form.
> We can appreciate that other people may have use cases where leaving relative 
> URIs as-is and not including a `BASE` is desirable but our feeling is that in 
> the more general case this does more harm than good and lets users shoot 
> themselves in the foot unwittingly as we have done in this example.
> We would like to propose that the default behaviour should be for a `BASE` 
> declaration to always be written if relative URIs are being output.  Or at 
> the very least we would like the behaviour to be configurable.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to