Re: LiteralUtils? (Was: March 2016 Report)

Paul Houle Tue, 29 Mar 2016 09:56:02 -0700

The separate library idea is a good one,  but here are factors to consider.


(1) The performance cost of going from native to string [x] string is
absurd in many cases,  particularly consider cases where doubles could get
serialized and deserialized many times in the course of a calculation.  The
default function based API does the "right thing" for string [x] string but
allows override for those methods if you want to back an RDF double with a
real double.

(2) There are a number of variations around RDF parsing where you consider
subjective decisions of many sorts,  especially in dealing with the
inevitable ill-formed inputs.  Even worse,  controlling the set of outputs
so that tool X doesn't choke.  Also there are practical questions of issues
of how exactly you map platform math to RDF math -- I like the way Clojure
uses promotion,  but others have other preferences.  How object creation is
handled is also subjective but it can make a huge difference in run time.

And that gets back to Andy's question from some time ago,  about this
interface

https://github.com/paulhoule/incubator-commonsrdf/blob/master/api/src/main/java/org/apache/commons/rdf/api/RDF.java

which gives every RDF-related object a link to a "Context" object,  the
Context object itself is a factory

https://github.com/paulhoule/incubator-commonsrdf/blob/master/api/src/main/java/org/apache/commons/rdf/api/RDFContext.java

which provides a lightweight extension facility for RDF objects.

Typically the context of an RDF object is going to be the same as the
context of the context that created it.  A parsimonious library will just
have one context which is referenced to in a static field so the cost of
this is close to zero.

If you want to have multiple RDF worlds driven by the same Java classes but
different configurations,  then you just put it in the instance field at a
modest cost.

The context can also be used to used to implement the kind of facility the
Jena has,  with the Model and the RDF objects around the Model that keep a
reference to what model they came from.  Maybe this can be pasted on on top
of another RDF implementation.

Past that the context could be individual to any or all RDF objects,
 providing a means of "reification" which might be as simple as I want to
perform a graph algorithm on the edges that involves coloring edges black
or white.



On Sun, Mar 27, 2016 at 3:23 PM, Stian Soiland-Reyes <[email protected]>
wrote:

> Agree that the various Literal conversion could be useful for anyone
> using Commons RDF, but it should be better provided as some kind of
> LiteralUtils -- mirroring BeanUtils etc. in Apache Commons.  Would
> that live in its own module, e.g. commonsrdf-utils ?
>
>
>
> On 24 March 2016 at 19:56, Andy Seaborne <[email protected]> wrote:
> >
> >
> >
> >
> > On 01/03/16 23:06, Paul Houle wrote:
> >>
> >> Personally I am not at all happy with representing literals solely as
> the
> >> direct product of a value and a type.
> >>
> >> The first problem is performance.  When I talk with the various burnouts
> >> and refugees from the semantic web one of the things I hear the most
> about
> >> is the "RDF Tax";  performance wise we can't afford to go through this
> >> serialization-deserialization every time we move data from one rdf
> toolset
> >> to another.
> >
> >
> > Moving between toolkits - or rather working with more than one toolkit at
> > once, is an import feature for commonsrdf, otherwise it is just another
> > toolkit, which, for me at least, would make it rather less interesting.
> >
> > Your proposal is to add to the literal interface with operations such as:
> >
> > Literal.asBigInteger
> > Literal.asLong
> > and others to match the XSD atomic types together with
> createLiteralDynamic
> > for the value to term direction.
> >
> > If this were done as a function:
> >
> > SomeLib.asBigInteger(Literal)
> >
> > then there is opportunities for caching.
> >
> > Either avoids serializing and de-serializing.
> >
> >> The other one is correctness.  If you don't have a "standard library"
> for
> >> parsing and unparsing dates people are going to screw it up over and
> over
> >> again.  For me the whole point of having RDF is going frictionless:
> once
> >> data goes to RDF it stays RDF -- I don't need to write different tools
> to
> >> deal with JSON,  XML,  spreadsheets all of which are ill-formed to some
> >> extent or another.
> >
> >
> > Java has XMLGregorianCalendar
> >
> > In Literal.asTemporal like all the as* doesn't check the datatype and are
> > only working on the lexical form.
> >
> > Again - would this be better as a library? It is after all casting, in
> the
> > same way SPARQL has cast like xsd:integer("string")
> >
> > What I'm wondering is whether there is a basic requirement to out these
> in
> > the Literal interface or not.
> >
> >>
> >> If it is easier to screw up dates than get them right you are losing
> most
> >> of the benefits of RDF and you are back in the same awful world of data
> >> integration with awk, sed and microsoft excel that everybody else is in
> >> and
> >> now RDF is just another source of problems rather than of solutions.
> >>
> >> The code at my fork here
> >>
> >> https://github.com/paulhoule/incubator-commonsrdf
> >>
> >> frankly does suck,  but I have yet to have seen a real evaluation of the
> >> ideas here,  but it comes down to four things:
> >>
> >> (i) you can implement the string [x] string interface for literals
> >> (ii) you can also pass literals around in java object form (Integer,
> >> LocalDateTime,  etc.)
> >> (iii) if you don't implement (ii) default methods will give you correct
> >> serialization and deserialization of literal values
> >> (iv) the code is ergonomic for the end user.
> >
> >
> > The other thing you have is the RDF interface whereby all RDFTerms return
> > the RDFactory (RDFContext) used to make them.
> >
> > Could you say more about this?
> >
> >     Andy
> >
> >
> >>
> >> ----
> >>
> >> Bigger picture,  however,  I have been thinking about a few other
> things:
> >>
> >> * a DSL that uses static imports to reduce the size of Jena client code
> >> considerably (these days I think the biggest difference between Java and
> >> Python is the attitude of the communities towards static imports)
> >> * from another perspective at the low level (objects that reflect the
> >> structure of RDF) you could say that performance is not everything, it
> is
> >> the only thing.  That points towards some system that uses plain objects
> >> as
> >> literals,  not out of any kind of ideology, but to avoid senseless
> >> allocation of objects.
> >>
> >> What I have been working on over the last few months is a system that is
> >> getting a bit complex and I am starting to transition away from the
> >> "manipulate rdf data with rdf operators" paradigm towards a "serialize
> and
> >> deserialize compound objects into RDF".  I found myself writing a lot of
> >> awkward and error prone code to do that serialization and
> deserialization.
> >>
> >>
> >> On Tue, Mar 1, 2016 at 5:41 PM, Lewis John Mcgibbney <
> >> [email protected]> wrote:
> >>
> >>> LOL Stian
> >>>
> >>> :)
> >>>
> >>> On Tue, Mar 1, 2016 at 2:39 PM, <
> >>> [email protected]> wrote:
> >>>
> >>>> ---------- Forwarded message ----------
> >>>> From: Stian Soiland-Reyes <[email protected]>
> >>>> To: dev <[email protected]>
> >>>> Cc:
> >>>> Date: Tue, 1 Mar 2016 22:39:31 +0000
> >>>> Subject: Re: March 2016 Report
> >>>> +1 :-))
> >>>>
> >>>> Although this is starting to sound like my EU projects.. do we need
> >>>> User Stories and Personas? :)
> >>>>
> >>>>
> >>>
> >>
> >>
> >>
> >
>
>
>
> --
> Stian Soiland-Reyes
> Apache Taverna (incubating), Apache Commons RDF (incubating)
> http://orcid.org/0000-0001-9842-9718
>



-- 
Paul Houle

*Applying Schemas for Natural Language Processing, Distributed Systems,
Classification and Text Mining and Data Lakes*

(607) 539 6254    paul.houle on Skype   [email protected]

:BaseKB -- Query Freebase Data With SPARQL
http://basekb.com/gold/

Legal Entity Identifier Lookup
https://legalentityidentifier.info/lei/lookup/
<http://legalentityidentifier.info/lei/lookup/>

Join our Data Lakes group on LinkedIn
https://www.linkedin.com/grp/home?gid=8267275

Re: LiteralUtils? (Was: March 2016 Report)

Reply via email to