[jira] [Commented] (JENA-1313) Language-specific collation in ARQ

ASF GitHub Bot (JIRA) Tue, 02 May 2017 00:17:37 -0700

    [ 
https://issues.apache.org/jira/browse/JENA-1313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15992465#comment-15992465
 ]


ASF GitHub Bot commented on JENA-1313:
--------------------------------------

Github user osma commented on the issue:

    https://github.com/apache/jena/pull/237
  
    @kinow I think this looks promising! I don't have much comments about the 
implementation code, but just being able to use `ORDER BY arq:collation("fi", 
?label)` seems that it would do a good job of solving my original problem. I do 
like the way you have smuggled in the collation information so that it's 
accessible to the `NodeValue.compare` function and it can thus rely on 
`Collator.compare`, which should be pretty fast. Maybe there's a more elegant 
way of doing that smuggling, but at least it seems to get the job done based on 
your example results.
    
    Did you by any chance test this with the [performance test 
case](https://issues.apache.org/jira/browse/JENA-1313?focusedCommentId=15963023&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15963023)
 that I wrote up earlier? I'd like to know how it compares to a plain ORDER BY 
in terms of performance. I can test that myself too when I have a suitable slot 
of time, but that might take a while since many deadlines are coming up in the 
next few days...


> Language-specific collation in ARQ
> ----------------------------------
>
>                 Key: JENA-1313
>                 URL: https://issues.apache.org/jira/browse/JENA-1313
>             Project: Apache Jena
>          Issue Type: Improvement
>          Components: ARQ
>    Affects Versions: Jena 3.2.0
>            Reporter: Osma Suominen
>
> As [discussed|http://markmail.org/message/v2bvsnsza5ksl2cv] on the users 
> mailing list in October 2016, I would like to change ARQ collation of literal 
> values to be language-aware and respect language-specific collation rules.
> This would probably involve changing at least the 
> [NodeUtils.compareLiteralsBySyntax|https://github.com/apache/jena/blob/master/jena-arq/src/main/java/org/apache/jena/sparql/util/NodeUtils.java#L199]
>  method.
> It currently sorts by lexical value first, then by language tag. Since the 
> collation order needs to be stable across all possible literal values, I 
> think the safest way would be to sort by language tag first, then by lexical 
> value according to the collation rules for that language.
> But what about subtags like {{@en-US}} or {{@pt-BR}}? Can they have different 
> collation rules than the main language? It would be a bit strange if all 
> {{@en-US}} literals sorted after {{@en}} literals...
> It would be good to check how Dydra does this and possibly take the same 
> approach. See the message linked above for further backgound.
> I've been talking with [~kinow] about this and he may be interested in 
> implementing it.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (JENA-1313) Language-specific collation in ARQ

Reply via email to