LorenzBuehmann opened a new issue, #1259:
URL: https://github.com/apache/jena/issues/1259

   Sending a query string longer then the `GET` request threshold, i.e. `POST` 
send mode is used, then the body content isn't marked as  UTF-8 encoding:
   
   ### Example query:
   ```sparql
   PREFIX wd: <http://www.wikidata.org/entity/>
   PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
   PREFIX geo: <http://www.opengis.net/ont/geosparql#>
   PREFIX geof: <http://www.opengis.net/def/function/geosparql/>
   PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
   PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
   PREFIX coy: <https://schema.coypu.org/#>
   PREFIX data:      <https://data.coypu.org/country/>
   
   PREFIX wikibase: <http://wikiba.se/ontology#>
   PREFIX bd: <http://www.bigdata.com/rdf#>
   PREFIX mwapi: <https://www.mediawiki.org/ontology#API/>
   PREFIX wdt: <http://www.wikidata.org/prop/direct/>
   
   SELECT * {
   
   
   BIND("Curaçao" AS ?str)
     SERVICE <https://query.wikidata.org/sparql> {
         SELECT ?item ?itemLabel ?typeLabel ?str {
         SERVICE wikibase:mwapi {
         bd:serviceParam wikibase:endpoint "www.wikidata.org";
           wikibase:api "EntitySearch";
           mwapi:search ?str ;
           mwapi:language "en";
           wikibase:limit 5 .
         ?item wikibase:apiOutputItem mwapi:item.
           ?num wikibase:apiOrdinal true.
         }
         ?item (wdt:P279|wdt:P31) ?type 
           FILTER(?type not in (wd:Q4167410, wd:Q13442814, wd:Q13433827))
            FILTER (EXISTS {?type wdt:P279* wd:Q618123} || EXISTS {?type 
wdt:P279* wd:Q1048835 })
         SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
         }
     }
   }
   ```
   ignore the meaning of the query, it just does an entity lookup in Wikidata 
via `SERVICE` clause. The important thing is the `BIND` with a string `Curaçao` 
having a non ASCII char.
   
   The result of this query is empty with (at least) Jena 4.4.0 and 4.5.0 
SNAPSHOT  - it works with Jena 4.1.0 for example. It also works if we remove 
one of the `FILTER`s in the query which leads to a simple `GET` request.
   
   I remember that the HTTP API was switched to the Java 11 internal one, that 
might be the point where the behavior changed.
   
   -----
   Note, I know that according to the 
[Standard](https://www.w3.org/TR/sparql11-protocol/#query-via-post-direct) the 
body should always be treated as UTF-8, at least it's stated:
   
   > Note that UTF-8 is the only valid charset here. 
   
    so it looks more like a Blazegraph issue in the end.
   
   ----
   Nevertheless, the UTF-8 encoding was probably explicitly  stated in the old 
HTTP API implementation.
   
   I tried a quick fix in the method `QueryExecHTTP::executeQueryPostBody`
   ```java
   // Use SPARQL query body and MIME type.
       private HttpRequest.Builder executeQueryPostBody(Params thisParams, 
String acceptHeader) {
           // Use thisParams (for default-graph-uri etc)
           String requestURL = requestURL(service, thisParams.httpString());
           HttpRequest.Builder builder = HttpLib.requestBuilder(requestURL, 
httpHeaders, readTimeout, readTimeoutUnit);
           contentTypeHeader(builder, WebContent.contentTypeSPARQLQuery + "; 
charset=UTF-8"); // this line has been changed
           acceptHeader(builder, acceptHeader);
           return builder.POST(BodyPublishers.ofString(queryString));
       }
   ```
   
   This solved the issue. Clearly, I don't think if this intended, but I doubt 
it's harmful to mention the encoding.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to