[ 
https://issues.apache.org/jira/browse/COMMONSRDF-51?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15824660#comment-15824660
 ] 

ASF GitHub Bot commented on COMMONSRDF-51:
------------------------------------------

Github user afs commented on the issue:

    https://github.com/apache/commons-rdf/pull/30
  
    @ansell mentions one of the reasons the wording for RDF 1.1is not so direct 
- RDF 1.0 did not sanction the common normalization defined in BCP47 
canonicalization, although that actually requires consulting the registry as 
well.
    
    Jena is lax by default, and retains the form as originally written. In 
practice, datasets seem to be internally consistent, all lower case or all 
syntax-canonical. 
    
    Variations of case are different nodes in the general case but are 
`Node.sameValue` (compare) and cause matching in graph.find. Some storage 
layers may differ and canonicalize the form, in order to index.



> RDF-1.1 specifies that language tags need to be compared using lower-case
> -------------------------------------------------------------------------
>
>                 Key: COMMONSRDF-51
>                 URL: https://issues.apache.org/jira/browse/COMMONSRDF-51
>             Project: Apache Commons RDF
>          Issue Type: Bug
>          Components: api
>    Affects Versions: 0.3.0
>            Reporter: Peter Ansell
>            Assignee: Stian Soiland-Reyes
>
> The [RDF-1.1 specification states that the [value space of Literal language 
> tags is 
> lowercase|https://www.w3.org/TR/rdf11-concepts/#section-Graph-Literal], which 
> does not conflict with the case-insensitive specification in BCP47. The 
> Literal.equals and Literal.hashCode API contracts should specify that 
> language tags must be compared using lowercase, even if they are otherwise 
> stored and returned as upper-case by getLanguageTag. The API currently has 
> incorrect language by saying "character-by-character" for language tag 
> comparisons, as that implies case-sensitive comparisons are used.
> The lowercasing must also be done using a locale that is consistent (known 
> example where lowercase and uppercase do not roundtrip as expected for 
> US-ASCII characters is Turkish [1]), so I would recommend actually stating 
> that .toLowerCase(Locale.ENGLISH) is used.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to