[jira] [Commented] (JENA-1313) Language-specific collation in ARQ

ASF GitHub Bot (JIRA) Thu, 15 Jun 2017 02:15:23 -0700

    [ 
https://issues.apache.org/jira/browse/JENA-1313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16050208#comment-16050208
 ]


ASF GitHub Bot commented on JENA-1313:
--------------------------------------

Github user afs commented on the issue:

    https://github.com/apache/jena/pull/262
  
    `SystemARQ.ValueExtensions` is a very old flag (that I'd complete forgotten 
about!) that will make ARQ only deal with the minimum set of datatypes in 
SPARQL, so no xsd:date, xsd:g*, language tags, and others appearing in the 
built-in operators for "same value" and "compare".
    
    Looking at it, the treatment I can see that the list isn't right - I'll 
push a fix as part of this PR and tidy up some comments.
    
    It is enabled in "strict mode", which is only useful for investigating 
specs, and is unlikely to be perfect so check the query and results to confirm.
    
    The set of "value spaces" is more detailed than that in XDM (XQuery and 
XPath Data Model) - xsd:date and xsd:dateTimes can be compared using special 
rules, duration needed a java fix, so {{NodeValue.compare}} has several special 
cases.


> Language-specific collation in ARQ
> ----------------------------------
>
>                 Key: JENA-1313
>                 URL: https://issues.apache.org/jira/browse/JENA-1313
>             Project: Apache Jena
>          Issue Type: Improvement
>          Components: ARQ
>    Affects Versions: Jena 3.2.0
>            Reporter: Osma Suominen
>
> As [discussed|http://markmail.org/message/v2bvsnsza5ksl2cv] on the users 
> mailing list in October 2016, I would like to change ARQ collation of literal 
> values to be language-aware and respect language-specific collation rules.
> This would probably involve changing at least the 
> [NodeUtils.compareLiteralsBySyntax|https://github.com/apache/jena/blob/master/jena-arq/src/main/java/org/apache/jena/sparql/util/NodeUtils.java#L199]
>  method.
> It currently sorts by lexical value first, then by language tag. Since the 
> collation order needs to be stable across all possible literal values, I 
> think the safest way would be to sort by language tag first, then by lexical 
> value according to the collation rules for that language.
> But what about subtags like {{@en-US}} or {{@pt-BR}}? Can they have different 
> collation rules than the main language? It would be a bit strange if all 
> {{@en-US}} literals sorted after {{@en}} literals...
> It would be good to check how Dydra does this and possibly take the same 
> approach. See the message linked above for further backgound.
> I've been talking with [~kinow] about this and he may be interested in 
> implementing it.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (JENA-1313) Language-specific collation in ARQ

Reply via email to