[jira] [Commented] (JENA-1313) Language-specific collation in ARQ

ASF GitHub Bot (JIRA) Fri, 12 May 2017 05:56:36 -0700

    [ 
https://issues.apache.org/jira/browse/JENA-1313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16008078#comment-16008078
 ]


ASF GitHub Bot commented on JENA-1313:
--------------------------------------

Github user afs commented on a diff in the pull request:

    https://github.com/apache/jena/pull/237#discussion_r116219667
  
    --- Diff: 
jena-arq/src/main/java/org/apache/jena/sparql/expr/nodevalue/NodeValueSortKey.java
 ---
    @@ -0,0 +1,91 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.jena.sparql.expr.nodevalue;
    +
    +import org.apache.jena.graph.Node;
    +import org.apache.jena.graph.NodeFactory;
    +import org.apache.jena.sparql.expr.NodeValue;
    +import org.apache.jena.sparql.util.FmtUtils;
    +
    +/**
    + * A {@link NodeValue} that supports collation value for a string. This 
allows query values
    + * to be sorted following rules for a specific collation.
    + */
    +public class NodeValueSortKey extends NodeValue {
    +
    +    /**
    +     * Node value text.
    +     */
    +    private final String string;
    +    /**
    +     * Node value collation language tag (e.g. fi, pt-BR, en, en-CA, etc).
    +     */
    +    private final String collation;
    +
    +    public NodeValueSortKey(final String string, final String collation) {
    +        this.string = string;
    +        this.collation = collation;
    +    }
    +
    +    public NodeValueSortKey(final String string, final String collation, 
Node n) {
    +        super(n);
    +        this.string = string;
    +        this.collation = collation;
    +    }
    +
    +    @Override
    +    public boolean isSortKey() {
    +        return Boolean.TRUE;
    +    }
    +
    +    @Override
    +    public String getString() {
    +        return string;
    +    }
    +
    +    @Override
    +    public String asString() {
    +        return string;
    +    }
    +
    +    @Override
    +    public String getCollation() {
    +        return collation;
    +    }
    +
    +    @Override
    +    protected Node makeNode() {
    +        return NodeFactory.createLiteral(string);
    +    }
    +
    --- End diff --
    
    Add comments that `makeNode` are fake (they don't round trip).
    
    This could be one of the XSD binary datatypes (base64binary, hexBinary) but 
really we have to acknowledge to ourselves that `NodeSortKey` is "internal" and 
appearing in output or in expressions is not going to fully work.


> Language-specific collation in ARQ
> ----------------------------------
>
>                 Key: JENA-1313
>                 URL: https://issues.apache.org/jira/browse/JENA-1313
>             Project: Apache Jena
>          Issue Type: Improvement
>          Components: ARQ
>    Affects Versions: Jena 3.2.0
>            Reporter: Osma Suominen
>
> As [discussed|http://markmail.org/message/v2bvsnsza5ksl2cv] on the users 
> mailing list in October 2016, I would like to change ARQ collation of literal 
> values to be language-aware and respect language-specific collation rules.
> This would probably involve changing at least the 
> [NodeUtils.compareLiteralsBySyntax|https://github.com/apache/jena/blob/master/jena-arq/src/main/java/org/apache/jena/sparql/util/NodeUtils.java#L199]
>  method.
> It currently sorts by lexical value first, then by language tag. Since the 
> collation order needs to be stable across all possible literal values, I 
> think the safest way would be to sort by language tag first, then by lexical 
> value according to the collation rules for that language.
> But what about subtags like {{@en-US}} or {{@pt-BR}}? Can they have different 
> collation rules than the main language? It would be a bit strange if all 
> {{@en-US}} literals sorted after {{@en}} literals...
> It would be good to check how Dydra does this and possibly take the same 
> approach. See the message linked above for further backgound.
> I've been talking with [~kinow] about this and he may be interested in 
> implementing it.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (JENA-1313) Language-specific collation in ARQ

Reply via email to