[ 
https://issues.apache.org/jira/browse/JENA-129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13119517#comment-13119517
 ] 

Andy Seaborne commented on JENA-129:
------------------------------------

The issue with i18n/normalization-01.ttl is that the tokenization for prefix 
names does not include combiningg characters (unicode: [#0300-#036F]).

Priority change to minor.  Combing character are unusual in URIs (and lead to 
trouble anyway with unequal URIs with identical visual appearance e.g.  é and 
é are different (the second is two characters, the second being a combining 
diacritic accent).  

The background email message has a confusing title - this is not a UTF-8 issue. 
 Also beware the email is best viewed as plain text to show the difference of 
between é and e followed by  ́.

RIOT uses the Java platform decoder for UTF-8.

The validation warning parsing "DAWG-Final/i18n/normalization-02.ttl" is 
unrelated.  The message is correct (look at the input URI).  It's outputing the 
post-resolution IRI.

                
> RIOT not parsing UTF combining characters correctly
> ---------------------------------------------------
>
>                 Key: JENA-129
>                 URL: https://issues.apache.org/jira/browse/JENA-129
>             Project: Jena
>          Issue Type: Bug
>          Components: RIOT
>         Environment: Java 1.6, Windows 7, ARQ 2.8.8
>            Reporter: Tim Harsch
>              Labels: RIOT
>
> Background on the issue can be found at the list archive:
> http://mail-archives.apache.org/mod_mbox/incubator-jena-users/201110.mbox/%[email protected]%3E
> RIOT failed to parse the SPARQL 1.0 DAWG test: "i18n/normalization-01.ttl".
> In offline email Andy also noted:
> I see one oddity:
> [[
> ==== DAWG-Final/i18n/normalization-02.ttl
> WARN  [line: 7, col: 8 ] Bad IRI: <eXAMPLE://a/b/%63/%7bfoo%7d#xyz>
> Code: 8/NON_INITIAL_DOT_SEGMENT in PATH: The path contains a segment
> /../ not at the beginning of a relative reference, or it contains a /./
> These should be removed.
> ]]
> because the test is on the input, not the resultant form used for toString.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


Reply via email to