Re: March 2016 Report

Paul Houle Tue, 01 Mar 2016 15:08:22 -0800

Personally I am not at all happy with representing literals solely as the
direct product of a value and a type.

The first problem is performance.  When I talk with the various burnouts
and refugees from the semantic web one of the things I hear the most about
is the "RDF Tax";  performance wise we can't afford to go through this
serialization-deserialization every time we move data from one rdf toolset
to another.

The other one is correctness.  If you don't have a "standard library" for
parsing and unparsing dates people are going to screw it up over and over
again.  For me the whole point of having RDF is going frictionless:  once
data goes to RDF it stays RDF -- I don't need to write different tools to
deal with JSON,  XML,  spreadsheets all of which are ill-formed to some
extent or another.

If it is easier to screw up dates than get them right you are losing most
of the benefits of RDF and you are back in the same awful world of data
integration with awk, sed and microsoft excel that everybody else is in and
now RDF is just another source of problems rather than of solutions.

The code at my fork here

https://github.com/paulhoule/incubator-commonsrdf

frankly does suck,  but I have yet to have seen a real evaluation of the
ideas here,  but it comes down to four things:

(i) you can implement the string [x] string interface for literals
(ii) you can also pass literals around in java object form (Integer,
LocalDateTime,  etc.)
(iii) if you don't implement (ii) default methods will give you correct
serialization and deserialization of literal values
(iv) the code is ergonomic for the end user.

----

Bigger picture,  however,  I have been thinking about a few other things:

* a DSL that uses static imports to reduce the size of Jena client code
considerably (these days I think the biggest difference between Java and
Python is the attitude of the communities towards static imports)
* from another perspective at the low level (objects that reflect the
structure of RDF) you could say that performance is not everything, it is
the only thing.  That points towards some system that uses plain objects as
literals,  not out of any kind of ideology, but to avoid senseless
allocation of objects.

What I have been working on over the last few months is a system that is
getting a bit complex and I am starting to transition away from the
"manipulate rdf data with rdf operators" paradigm towards a "serialize and
deserialize compound objects into RDF".  I found myself writing a lot of
awkward and error prone code to do that serialization and deserialization.

On Tue, Mar 1, 2016 at 5:41 PM, Lewis John Mcgibbney <
[email protected]> wrote:

> LOL Stian
>
> :)
>
> On Tue, Mar 1, 2016 at 2:39 PM, <
> [email protected]> wrote:
>
> > ---------- Forwarded message ----------
> > From: Stian Soiland-Reyes <[email protected]>
> > To: dev <[email protected]>
> > Cc:
> > Date: Tue, 1 Mar 2016 22:39:31 +0000
> > Subject: Re: March 2016 Report
> > +1 :-))
> >
> > Although this is starting to sound like my EU projects.. do we need
> > User Stories and Personas? :)
> >
> >
>

-- 
Paul Houle

*Applying Schemas for Natural Language Processing, Distributed Systems,
Classification and Text Mining and Data Lakes*

(607) 539 6254    paul.houle on Skype   [email protected]

:BaseKB -- Query Freebase Data With SPARQL
http://basekb.com/gold/

Legal Entity Identifier Lookup
https://legalentityidentifier.info/lei/lookup/
<http://legalentityidentifier.info/lei/lookup/>

Join our Data Lakes group on LinkedIn
https://www.linkedin.com/grp/home?gid=8267275

Re: March 2016 Report

Reply via email to