[ 
https://issues.apache.org/jira/browse/JENA-1377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16098935#comment-16098935
 ] 

Andy Seaborne commented on JENA-1377:
-------------------------------------

RDF 1.1 says:

.bq Lexical representations of language tags MAY be converted to lower case. 
The value space of language tags is always in lower case.

and

.bq Literal term equality: Two literals are term-equal (the same RDF literal) 
if and only if the two lexical forms, the two datatype IRIs, and the two 
language tags (if any) compare equal, character by character. Thus, two 
literals can have the same value without being the same RDF term.

Jena isomorphism is term equality, not value equality. c.f. The literals 001 
and 1

You can canonicalise literals on input with 
{{RDFParserBuilder.canonicalLiterals}} or feed the parser output to your own 
choice of algorithm.

Some users want language tags preserved as written, some users want lower-case 
language tags, some want BCP47-canonical.  There is no single right choice, and 
is more complicated when there is a desire for the triple count to be preserved.

A change would invalidate existing data. 

Given these requirements, some work by the application is often necessary.

See other discussions on Jena JIRA about isomorphism and term-equality e.g. 
JENA-1303


> Model.isIsomorphicWith() returns false when language tags case do not match
> ---------------------------------------------------------------------------
>
>                 Key: JENA-1377
>                 URL: https://issues.apache.org/jira/browse/JENA-1377
>             Project: Apache Jena
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: Jena 3.3.0
>         Environment: Linux
>            Reporter: Elie Roux
>
> Model.isIsomorphicWith() treats language tags in a case-sensitive way, which 
> is against BCP47 spec. It is easily shown with an example:
> {noformat}
>            Model m1 = ModelFactory.createDefaultModel();
>            Resource r = m1.getResource("http://example.com/resource";);
>            Property p = m1.getProperty("http://example.com/property";);
>            m1.add(r, p, m1.createLiteral("example", "zh-Latn-pinyin")); // 
> canonical
>            Model m2 = ModelFactory.createDefaultModel();
>            r = m2.getResource("http://example.com/resource";);
>            p = m2.getProperty("http://example.com/property";);
>            m2.add(r, p, m1.createLiteral("example", "zh-latn-pinyin")); // 
> lower case
>            System.out.println(m1.isIsomorphicWith(m2));
> {noformat}
> prints false, while it clearly should print true. Related bug (which is not 
> really a bug per se, just a trigger for this one: 
> https://github.com/jsonld-java/jsonld-java/issues/199
> See also https://issues.apache.org/jira/browse/COMMONSRDF-51 for some 
> consideration of the language tag case.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to