[ 
https://issues.apache.org/jira/browse/JENA-1313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15990089#comment-15990089
 ] 

ASF GitHub Bot commented on JENA-1313:
--------------------------------------

Github user kinow commented on the issue:

    https://github.com/apache/jena/pull/237
  
    Sorry about the mess. I reverted the previous changes, and wanted to keep 
everything in the branch history in case we decided to go back that way, but 
messed up with a `git rebase`. Cherry picked a few commits, now it's looking OK.
    
    So now this updated pull request is following a different direction. 
Instead of changing the default behaviour, based on language tags, it contains 
a 2-parameters "collation" function. All changes in ARQ.
    
    Please, ignore comments/unit tests/code readability/etc, as what this pull 
request is right now is a mere suggestion of an alternative for JENA-1313, and 
may be again discarded in case there are too many problems with this 
implementation.
    
    The FN_Collation.java contains the code for the new function. The first 
argument is a locale, used for finding the collator. The second argument to the 
function is the NodeValue (Expr). What the function does, is quite simple - and 
possibly naïve?. It extracts the string literal from the Expr part, then 
creates a new NodeValue that contains both String + locale.
    
    Further down, the NodeValueString was modified as well to keep track of the 
string locale. Alternatively, we could create a new NodeValue subtype, instead 
of adding an optional locale (backward binary compatible change, as we add, but 
not change existing methods).
    
    Then, when the SortCondition in the Query is evaluated, and then the 
NodeValueString#compare method is called, it checks if it was given a desired 
locale. If so, it sorts using that locale.
    
    Notice that this function will be applied always in the String Value Space 
in ARQ, as even when we have a Language Tag, it is discarded and we use only 
the string. Basically, any node with a literal string will become a 
NodeValueString, when this function is applied to the node.
    
    With this, users are able to choose a Collation, overriding any language 
tags. This way, if your data contains @en and @en-GB, you can decide to use any 
Collation you desire on your query.
    
    Thoughts?
    
    Cheers
    Bruno


> Language-specific collation in ARQ
> ----------------------------------
>
>                 Key: JENA-1313
>                 URL: https://issues.apache.org/jira/browse/JENA-1313
>             Project: Apache Jena
>          Issue Type: Improvement
>          Components: ARQ
>    Affects Versions: Jena 3.2.0
>            Reporter: Osma Suominen
>
> As [discussed|http://markmail.org/message/v2bvsnsza5ksl2cv] on the users 
> mailing list in October 2016, I would like to change ARQ collation of literal 
> values to be language-aware and respect language-specific collation rules.
> This would probably involve changing at least the 
> [NodeUtils.compareLiteralsBySyntax|https://github.com/apache/jena/blob/master/jena-arq/src/main/java/org/apache/jena/sparql/util/NodeUtils.java#L199]
>  method.
> It currently sorts by lexical value first, then by language tag. Since the 
> collation order needs to be stable across all possible literal values, I 
> think the safest way would be to sort by language tag first, then by lexical 
> value according to the collation rules for that language.
> But what about subtags like {{@en-US}} or {{@pt-BR}}? Can they have different 
> collation rules than the main language? It would be a bit strange if all 
> {{@en-US}} literals sorted after {{@en}} literals...
> It would be good to check how Dydra does this and possibly take the same 
> approach. See the message linked above for further backgound.
> I've been talking with [~kinow] about this and he may be interested in 
> implementing it.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to