[jira] [Commented] (ANY23-19) Abstract away any specific RDF APIs

Peter Ansell (Commented) (JIRA) Thu, 12 Apr 2012 23:37:11 -0700

    [ 
https://issues.apache.org/jira/browse/ANY23-19?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13253175#comment-13253175
 ]


Peter Ansell commented on ANY23-19:
-----------------------------------

Hi Paolo,

The example library that is continually being referred to here, java-rdfa, 
abstracts away from clerezza, sesame and jena interfaces by representing 
everything as strings. See 
https://github.com/shellac/java-rdfa/blob/master/core/src/main/java/net/rootdev/javardfa/StatementSink.java
 The java-rdfa library is also not being developed or used actively, so it 
isn't the best example of a successful library that doesn't use a type-safe RDF 
Statement/Value API. I worked on java-rdfa a little but there is no push behind 
it. Even its author refers to it as "The cruftiest RDFa parser in the world"

Two of the three goals of Any23, command line utility, and web service, are 
completely ambivalent to the technology being used internally, as long as it is 
high quality, and the Sesame libraries are very high quality in my opinion and 
experience. The only goal that would be affected would be the use of Any23 as a 
library. How often is Any23 currently being used as a library? Could another 
project easily implement the same functionality using Jena with less effort 
than it would take to either create a custom string based solution, or move to 
another framework that may have just as many or more dependencies?

In terms of the use of Any23 as a library, is there anything about the Sesame 
Model hierarchy (Value/Resource/Literal/BNode/URI) that would be better 
represented using a custom solution? As one example, I have been working with 
OWLAPI recently and its RDF handling is shocking, it merges URIs with Blank 
Nodes to form what it refers to as IRIs. It has a custom internal solution that 
only recognises two types of triples, those with an IRI in the object position 
(where IRI is not type-safely defined between URI and BlankNode) and those with 
Literals in the object position. I can't imagine Any23 going down this route, 
but it is the worst case scenario if the API is converted without a reason. In 
the simplest scenario, it may be possible to reuse the Sesame Model hierarchy 
to produce Values that work across all three libraries, using custom ValueImpl 
etc., implementations that actually implement the relevant interfaces from 
other libraries, along a custom ValueFactory to produce these 
multi-library-compatible Values (custom ValueFactories can be plugged into any 
Rio Parser using Rio.getParser(RDFFormat, ValueFactory), a functionality which 
I haven't seen in other libraries).

In terms of the actual packages that are currently used, there are four basic 
packages sesame-model, sesame-rio-api, sesame-repository-api, sesame-sail-api, 
sesame-sail-memory. These base libraries are small dependencies. One other 
dependency is some small utilities, sesame-util that are used by sesame-model 
and other sesame libraries.

82K - sesame-model-2.6.5.jar
36K - sesame-repository-api-2.6.5.jar
22K - sesame-rio-api-2.6.5.jar
56K - sesame-sail-api-2.6.5.jar
54K - sesame-sail-memory-2.6.5.jar
53K - sesame-util-2.6.5.jar

The value Impl classes should not be directly referenced. They should be 
accessed using a ValueFactory and used as their Interfaces. This doesn't change 
any of the libraries that are used, but it is better practice.

The other libraries that pull in the Rio parsers can be linked in dynamically 
without compiling in the dependency, so the use of Any23 as a library would 
enable people to pull them in as needed. See Rio.getParser(RDFFormat, 
ValueFactory) and Rio.getWriter(RDFFormat) methods. It would be valuable if 
Any23 could dynamically pull in all of its parsers and writers using the Rio.* 
static methods. Then it could be used with the absolute minimum number of 
parsers and writers for the current user. 

4.6K - sesame-rio-n3-2.6.5.jar
14K - sesame-rio-ntriples-2.6.5.jar
33K - sesame-rio-rdfxml-2.6.5.jar
17K - sesame-rio-turtle-2.6.5.jar

Switching to another library may cause the bloat that you say you do not want.

For example, Jena and its immediate dependencies is quite large compared to the 
modular sesame jar files, and that doesn't include the SPARQL parsing libraries 
from ARQ, as indeed the sesame libraries quote above do not include the sparql 
libraries.

1.7M - jena-core-2.7.0-incubating.jar
151K - jena-iri-0.9.0-incubating.jar
3.1M - icu4j-3.4.4.jar
1.4M - xercesImpl-2.10.0.jar

                
> Abstract away any specific RDF APIs
> -----------------------------------
>
>                 Key: ANY23-19
>                 URL: https://issues.apache.org/jira/browse/ANY23-19
>             Project: Apache Any23
>          Issue Type: New Feature
>    Affects Versions: 0.7.0
>            Reporter: Paolo Castagna
>             Fix For: 0.8.0
>
>
> Any23 currently uses Sesame to work with or parse RDF. Specifically Any23 
> uses these classes from org.openrdf.* packages:
> org.openrdf.model.BNode
> org.openrdf.model.datatypes.XMLDatatypeUtil
> org.openrdf.model.impl.LiteralImpl
> org.openrdf.model.impl.URIImpl
> org.openrdf.model.impl.ValueFactoryImpl
> org.openrdf.model.Literal
> org.openrdf.model.Resource
> org.openrdf.model.Statement
> org.openrdf.model.URI
> org.openrdf.model.Value
> org.openrdf.model.ValueFactory
> org.openrdf.model.vocabulary.OWL
> org.openrdf.model.vocabulary.RDF
> org.openrdf.model.vocabulary.RDFS
> org.openrdf.model.vocabulary.XMLSchema
> org.openrdf.repository.RepositoryConnection
> org.openrdf.repository.RepositoryException
> org.openrdf.repository.RepositoryResult
> org.openrdf.repository.sail.SailRepository
> org.openrdf.rio.helpers.RDFParserBase
> org.openrdf.rio.ntriples.NTriplesParser
> org.openrdf.rio.ntriples.NTriplesUtil
> org.openrdf.rio.ntriples.NTriplesWriter
> org.openrdf.rio.ParseErrorListener
> org.openrdf.rio.ParseLocationListener
> org.openrdf.rio.RDFFormat
> org.openrdf.rio.RDFHandler
> org.openrdf.rio.RDFHandlerException
> org.openrdf.rio.RDFParseException
> org.openrdf.rio.RDFParser
> org.openrdf.rio.rdfxml.RDFXMLParser
> org.openrdf.rio.rdfxml.RDFXMLWriter
> org.openrdf.rio.turtle.TurtleWriter
> org.openrdf.sail.memory.MemoryStore
> org.openrdf.sail.Sail
> org.openrdf.sail.SailException
> Would it be possible to abstract away any specific RDF APIs to allow Any23 
> users to chose between, say: Apache Clerezza [1], Apache Jena [2], Sesame [3] 
> and/or others?
> An example of small RDF distiller which does this is java-rdfa [4]. Maybe a 
> similar agnostic (but easy to integrate) approach is possible for Any23. 
> Although, java-rdfa does not need to parse RDF content itself. 
>  [1] http://incubator.apache.org/clerezza/
>  [2] http://incubator.apache.org/jena/
>  [3] http://www.openrdf.org/
>  [4] https://github.com/shellac/java-rdfa

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (ANY23-19) Abstract away any specific RDF APIs

Reply via email to