Github user afs commented on a diff in the pull request:
https://github.com/apache/jena/pull/308#discussion_r152531097
--- Diff:
jena-arq/src/main/java/org/apache/jena/riot/process/normalize/CanonicalizeLiteral.java
---
@@ -73,6 +76,36 @@ public Node apply(Node node) {
return n2 ;
}
+ /** Convert the lexical form to a canonical form if one of the known
datatypes,
+ * otherwise return the node argument. (same object :: {@code ==})
+ */
+ public static Node canonicalValue(Node node) {
+ if ( ! node.isLiteral() )
+ return node ;
+ // Fast-track
+ if ( NodeUtils.isLangString(node) )
+ return node;
+ if ( NodeUtils.isSimpleString(node) )
+ return node;
+
+ if ( !
node.getLiteralDatatype().isValid(node.getLiteralLexicalForm()) )
+ // Invalid lexical form for the datatype - do nothing.
+ return node;
+
+ RDFDatatype dt = node.getLiteralDatatype() ;
+ // Datatype, not rdf:langString (RDF 1.1).
+ DatatypeHandler handler = dispatch.get(dt) ;
+ if ( handler == null )
+ return node ;
+ Node n2 = handler.handle(node, node.getLiteralLexicalForm(), dt) ;
+ if ( n2 == null )
+ return node ;
+ return n2 ;
+ }
+
+ /** Convert the language tag of a lexical form to a canonical form if
one of the known datatypes,
+ * otherwise return the node argument. (same object; compare by {@code
==})
+ */
private static Node canonicalLangtag(String lexicalForm, String
langTag) {
String langTag2 = LangTag.canonical(langTag);
if ( langTag2.equals(langTag) )
--- End diff --
Here, node isn't passed in so it can't be returned. Style thing. Node is
already known to have a language tag so I don't like passing in a Node which
can be wrong e.g.through mis-call from somewhere else.. Passing lex+lang forces
it to be the information for a language tagged literal.
It's tested at line 74
```
if ( n2 == null )
return node ;
```
and elsewhere conversion also sometimes returns `null` for "no conversion"
which means no new node is needed which is more efficient (meaureably).
---