Re: parsing RDF literals

Paolo Castagna Mon, 20 Jun 2011 12:22:03 -0700

Paolo Castagna wrote:

Hi Andy


Andy Seaborne wrote:



On 17/06/11 08:55, Paolo Castagna wrote:

Hi Andy (hi all),
we've seen recently different people asking for a similar question.

Given that there are different ways, what would be the best|recommended
way?
Or, what are pros/cons of the different ways?

From an user perspective, shouldn't we promote the NodeFactoryapproach,

so that we can switch implementation without disruption for users?


That's why NodeFactory.paseNode is there!

If you look at the call huierarchiy, you'll see manyNodeFactory.apsreNode calls, few SSE.parseNode.

So the contents of NodeFactory.parseNode can be chnaged. It abstractsthe implementation from the use.

NodeFactory is currently using SSE.parseNode(...):

public static Node parseNode(String nodeString)
{
return SSE.parseNode(nodeString) ;
}

Therefore, we have two ways: SSE (old) vs. RIOT's Tokenizer(s) (new).
The assumption is that 'new' is better|faster than 'old'. :-)
Is the plan to move from SSE to RIOT's Tokenizer(s)?

I tried, just for curiosity, to replace SSE with RIOT's Tokenizer in
the NodeFactory implementation parseNode method:

public static Node parseNode(String nodeString)
{

returnTokenizerFactory.makeTokenizerString(nodeString).next().asNode() ;

}

... we have a few failures and errors, precisely:
Tests run: 3865, Failures: 41, Errors: 56, Skipped: 0

If you look at the errors and failures, I hope you'll see a commonpattern.


Yes. They were all about datatype literals, validation and normalization
(or lack of it). I should have read the Token.asNode() method comment more
carefully, it says:

/** Token to Node, a very direct form that is purely driven off thetoken.

   *  Turtle and N-triples need to process the token and not call this:
   *  1/ Use bNode label as given
   *  2/ No prefix or URI resolution.
   */


SSE has default prefixes builtin.


Right. I think this is better, if one wants to use RIOT to parse a String
into a Node:

public static Node parseNode(String nodeString)
{
  return RiotLib.parse(nodeString) ;
}


Not, many, be there is clearly something to do and the two are not
functionally
equivalent (yet). Maybe this is an opportunity for someone who might
wants to
help. :-)

Sorry, for all these questions, but I imagine others have similar
questions in relation to String -> Node and Node -> String conversion.


I'm less sure that it is that.

String -> Node happens in context.

It a matter of language design which is what RIOT can help with.
Many of the calls in NodeFactory.parseNode are in testing in ARQ.

The question you might ask Tim is where did the string come from?


Right. Now, this is off-topic in relation to Tim's problem.

However, I had a look at the errors. Only 3 failures after the tiny changes
in the patch attached to this message.


Down to 0 (i.e. all tests pass) if we do this in RiotLib:

Index: src/org/openjena/riot/system/RiotLib.java
===================================================================
--- src/org/openjena/riot/system/RiotLib.java   (revision 1137586)
+++ src/org/openjena/riot/system/RiotLib.java   (working copy)
@@ -20,7 +20,7 @@
 /** Misc RIOT code */
 public class RiotLib
 {
-    static ParserProfile profile = profile(Lang.TURTLE, null, null) ;

+ static ParserProfile profile = profile(Lang.TURTLE, null,ErrorHandlerFactory.errorHandlerWarn) ;

     static {
         PrefixMap pmap = profile.getPrologue().getPrefixMap() ;
         pmap.add("rdf",  ARQConstants.rdfPrefix) ;
@@ -31,6 +31,8 @@
         pmap.add("op" ,  ARQConstants.fnPrefix) ;
         pmap.add("ex" ,  "http://example/ns#";) ;
         pmap.add("" ,    "http://example/";) ;
+
+        profile.getPrologue().setResolver(IRIResolver.createNoResolve());
     }

     /** Parse a string to get one Node (the first token in the string) */


But, I am not sure if this is what we need/want for RiotLib.parse(String string)
method.

By the way, NodeFactory.parseNode(String nodeString) method is only called,
outside tests, from ExprUtils.parseNodeValue(String s) method which is unused.

In conclusion, I am happy to see that all works and, if we want, we could change
the NodeFactory.parseNode implementation to use the (new) 'RIOT way' (which is
faster... ~x5?) instead of the (old) 'SSE way'.

Paolo

Not something we need to do, but I am dropping a message here so that Iwill

not forget about it next time someone ask about String -> Node and Node ->
String conversions. :-)

Paolo


    Andy


Thank you,
Paolo


Andy Seaborne wrote:

Tim,

There's a NodeFactory static:

com.hp.hpl.jena.sparql.util.
NodeFactory.parseNode(String str)

Theer is also the older way Node SSE.parseNode(String) which taps into
the SSE [*] parser. SSE is an odd name for this - code ended up there
and it just stayed there.

The other way is:

TokenizerFactory.makeTokenizerString("...").next() ;

Tokenizers give varuious ways to handle a stream of tokens. One way is
from a string.

Andy

[*] http://openjena.org/wiki/SSE

On 17/06/11 00:58, Tim Harsch wrote:

If I have text in any of the various supported forms for RDF
literals. What
method can I call to parse that text into a Node_Literal? It should
be able to
recognize the lang tag and the typed literal string and act
appropriately. I've
found NodeFactory.createLiteralNode(String lex, String lang, String
datatypeURI)
but it requires you've already parsed the literal into its lexical
form, lang
tag and datatypeURI to use...

Thanks,
tim

Re: parsing RDF literals

Reply via email to