[jira] [Commented] (JENA-1313) Language-specific collation in ARQ

ASF GitHub Bot (JIRA) Wed, 12 Apr 2017 05:53:05 -0700

    [ 
https://issues.apache.org/jira/browse/JENA-1313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15965782#comment-15965782
 ]


ASF GitHub Bot commented on JENA-1313:
--------------------------------------

GitHub user kinow opened a pull request:

    https://github.com/apache/jena/pull/237

    JENA-1313: compare using a Collator when both literals are tagged with same 
language

    Mimics the behaviour of Dydra described 
[here](http://blog.dydra.com/2015/05/06/collation).
    
    When there are strings with the same language, then instead of simply 
comparing the text, it uses 
[java.text.Collator](https://docs.oracle.com/javase/7/docs/api/java/text/Collator.html)
 and the language locale to compare strings.
    
    This does not create a collate:collate function as described in JENA-1313 
as a possible solution, but could be still useful for users that expect the 
sort results to follow the values' language collation.
    
    Needs further tests and discussion.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/kinow/jena JENA-1313-1

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/jena/pull/237.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #237
    
----
commit fdcfc6307d7d0f4cbd850adeeb48d3ca9300c266
Author: Bruno P. Kinoshita <[email protected]>
Date:   2017-04-12T12:44:42Z

    JENA-1313: compare using a Collator when both literals are tagged with same 
language

----


> Language-specific collation in ARQ
> ----------------------------------
>
>                 Key: JENA-1313
>                 URL: https://issues.apache.org/jira/browse/JENA-1313
>             Project: Apache Jena
>          Issue Type: Improvement
>          Components: ARQ
>    Affects Versions: Jena 3.2.0
>            Reporter: Osma Suominen
>
> As [discussed|http://markmail.org/message/v2bvsnsza5ksl2cv] on the users 
> mailing list in October 2016, I would like to change ARQ collation of literal 
> values to be language-aware and respect language-specific collation rules.
> This would probably involve changing at least the 
> [NodeUtils.compareLiteralsBySyntax|https://github.com/apache/jena/blob/master/jena-arq/src/main/java/org/apache/jena/sparql/util/NodeUtils.java#L199]
>  method.
> It currently sorts by lexical value first, then by language tag. Since the 
> collation order needs to be stable across all possible literal values, I 
> think the safest way would be to sort by language tag first, then by lexical 
> value according to the collation rules for that language.
> But what about subtags like {{@en-US}} or {{@pt-BR}}? Can they have different 
> collation rules than the main language? It would be a bit strange if all 
> {{@en-US}} literals sorted after {{@en}} literals...
> It would be good to check how Dydra does this and possibly take the same 
> approach. See the message linked above for further backgound.
> I've been talking with [~kinow] about this and he may be interested in 
> implementing it.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (JENA-1313) Language-specific collation in ARQ

Reply via email to