[
https://issues.apache.org/jira/browse/JENA-785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14137027#comment-14137027
]
Osma Suominen commented on JENA-785:
------------------------------------
My guess is that this problem has to do with handling supplementary unicode
characters (above 0xFFFF) in the SUBSTR function. If it expects all characters
to be 16 bits, it will chop the first character (0x00010408) in half. That
could explain the strange text format response.
Possibly relevant:
http://www.oracle.com/technetwork/articles/java/supplementary-142654.html
> exception getting substr() from Deseret string
> ----------------------------------------------
>
> Key: JENA-785
> URL: https://issues.apache.org/jira/browse/JENA-785
> Project: Apache Jena
> Issue Type: Bug
> Environment: Ubuntu 14.04 amd64
> $ java -version
> java version "1.7.0_65"
> OpenJDK Runtime Environment (IcedTea 2.5.1) (7u65-2.5.1-4ubuntu1~0.14.04.2)
> OpenJDK 64-Bit Server VM (build 24.65-b04, mixed mode)
> Reporter: Osma Suominen
>
> I stumbled on a problem with a SPARQL query, run against Fuseki 1.1.0 and the
> latest 1.1.1 snapshot. This is a minimal example demonstrating the problem.
> I have the following single triple (based on Lexvo.org) as my dataset:
> <http://lexvo.org/id/iso3166/AQ> <http://www.w3.org/2000/01/rdf-schema#label>
> "\U00010408\U0001044C\U0001043B\U0001042A\U00010449\U0001043F\U0001043B\U0001042E\U0001043F\U00010432"@en-Dsrt
> .
> (this is a label in the rather obscure Deseret alphabet - see wikipedia for
> details about the script)
> Then I ran this query:
> --cut--
> SELECT ?label (SUBSTR(?label, 1, 1) as ?l)
> WHERE { ?s <http://www.w3.org/2000/01/rdf-schema#label> ?label }
> --cut--
> With the text output, I get this:
> --cut--
> ------------------------------------------------
> | label | l |
> ================================================
> | "𐐈𐑌𐐻𐐪𐑉𐐿𐐻𐐮𐐿𐐲"@en-Dsrt | " |
> ------------------------------------------------
> --cut--
> (you may need to install a Deseret font if you can't see the letters above. I
> use DeseretBee from here:
> http://copper.chem.ucla.edu/~jericks/Fonts/Bee%20Fonts/Sans%20Serif/_DeseretBee2.ttf)
> The result is not completely wrong, but there's something fishy about the ?l
> binding - it only shows one quote.
> With output set to JSON, XML, CSV or TSV, I get this error instead:
> --cut--
> Error 500: java.nio.charset.MalformedInputException: Input length = 1
> Fuseki - version 1.1.1-SNAPSHOT (Build date: 2014-09-16T13:16:19+0000)
> --cut--
> As you can see I'm using the latest Fuseki snapshot from yesterday.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)