LorenzBuehmann opened a new issue, #1259: URL: https://github.com/apache/jena/issues/1259
Sending a query string longer then the `GET` request threshold, i.e. `POST` send mode is used, then the body content isn't marked as UTF-8 encoding: ### Example query: ```sparql PREFIX wd: <http://www.wikidata.org/entity/> PREFIX skos: <http://www.w3.org/2004/02/skos/core#> PREFIX geo: <http://www.opengis.net/ont/geosparql#> PREFIX geof: <http://www.opengis.net/def/function/geosparql/> PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> PREFIX coy: <https://schema.coypu.org/#> PREFIX data: <https://data.coypu.org/country/> PREFIX wikibase: <http://wikiba.se/ontology#> PREFIX bd: <http://www.bigdata.com/rdf#> PREFIX mwapi: <https://www.mediawiki.org/ontology#API/> PREFIX wdt: <http://www.wikidata.org/prop/direct/> SELECT * { BIND("Curaçao" AS ?str) SERVICE <https://query.wikidata.org/sparql> { SELECT ?item ?itemLabel ?typeLabel ?str { SERVICE wikibase:mwapi { bd:serviceParam wikibase:endpoint "www.wikidata.org"; wikibase:api "EntitySearch"; mwapi:search ?str ; mwapi:language "en"; wikibase:limit 5 . ?item wikibase:apiOutputItem mwapi:item. ?num wikibase:apiOrdinal true. } ?item (wdt:P279|wdt:P31) ?type FILTER(?type not in (wd:Q4167410, wd:Q13442814, wd:Q13433827)) FILTER (EXISTS {?type wdt:P279* wd:Q618123} || EXISTS {?type wdt:P279* wd:Q1048835 }) SERVICE wikibase:label { bd:serviceParam wikibase:language "en". } } } } ``` ignore the meaning of the query, it just does an entity lookup in Wikidata via `SERVICE` clause. The important thing is the `BIND` with a string `Curaçao` having a non ASCII char. The result of this query is empty with (at least) Jena 4.4.0 and 4.5.0 SNAPSHOT - it works with Jena 4.1.0 for example. It also works if we remove one of the `FILTER`s in the query which leads to a simple `GET` request. I remember that the HTTP API was switched to the Java 11 internal one, that might be the point where the behavior changed. ----- Note, I know that according to the [Standard](https://www.w3.org/TR/sparql11-protocol/#query-via-post-direct) the body should always be treated as UTF-8, at least it's stated: > Note that UTF-8 is the only valid charset here. so it looks more like a Blazegraph issue in the end. ---- Nevertheless, the UTF-8 encoding was probably explicitly stated in the old HTTP API implementation. I tried a quick fix in the method `QueryExecHTTP::executeQueryPostBody` ```java // Use SPARQL query body and MIME type. private HttpRequest.Builder executeQueryPostBody(Params thisParams, String acceptHeader) { // Use thisParams (for default-graph-uri etc) String requestURL = requestURL(service, thisParams.httpString()); HttpRequest.Builder builder = HttpLib.requestBuilder(requestURL, httpHeaders, readTimeout, readTimeoutUnit); contentTypeHeader(builder, WebContent.contentTypeSPARQLQuery + "; charset=UTF-8"); // this line has been changed acceptHeader(builder, acceptHeader); return builder.POST(BodyPublishers.ofString(queryString)); } ``` This solved the issue. Clearly, I don't think if this intended, but I doubt it's harmful to mention the encoding. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
