Re: GSoC 2014 - Develop a new in-memory RDF Dataset implementation (JENA-624)

Timothy Armstrong Wed, 19 Mar 2014 05:38:25 -0700

Hello Andy,

Thank you so much for your response. I would be very interested inmaking a new implementation of DatasetGraph, although I would have tolearn about the issues involved in SPARQL query optimization, as I havenot studied those issues. I would also have to learn more aboutparallel programming. Well, maybe it is late to be applying for GSoC.I would just like to get involved in an open source Semantic Webproject, in any case.

Thank you also for the references to related work. I shall have to lookat them in more detail. I also need to explore the connection with JSON-LD.

I have had a really large vision about how we can enhance all ourobject-oriented technologies with the Semantic Web technologies. RunningSPARQL on object-oriented data is part of it, and I have thought ARQwould be best for that purpose. Another part is that we can enhance theobject-oriented data model with many elements of the OWL data model,including much of the reasoning, anywhere the object-oriented data modelis used. The intention is to let people use the OWL data model in allthe object-oriented programs they write, instead of just theobject-oriented data model as it is.

What I would really like, though, would be if we could get more data onthe Semantic Web and make it larger, with all the object-oriented datain the world that people are willing to post. I have thought what wereally want to do is set up SPARQL endpoints on object databases.Another source of object-oriented data in the world is object-relationalmapping. I'm not entirely sure, but I have thought it might also bepossible to set up SPARQL endpoints on data sources of object-relationalmapping, by treating the data as object-oriented data.

I do have in mind how to implement at least large parts of thefunctionality of the Graph, Node, and Triple interfaces backed byobject-oriented data, and I have a lot of code working toward thatpurpose in Java. My understanding of ARQ is that it would be sufficientin order to run SPARQL SELECT, CONSTRUCT, and Update queries onobject-oriented data if we would just implement the Model interface, orrelated interfaces, backed by object-oriented data. Is that correct?

As I have in mind, there would be one implementation of Model for eachpiece of object database software, or maybe for each piece ofobject-relational mapping software, although the implementations couldhave a lot of code in common. There would also be a Model for anarbitrary Java Collection of Java objects that the user would supply.Additionally, there would be a Model to use in Java programs that wouldconsist of all the Java objects in memory that have not been garbagecollected, which we could use to run SPARQL on all the objects inmemory. (I have means of accessing main memory in Java with AspectJ.)

Well, as I say, maybe it is late to be applying for GSoC. I have justbeen hoping that I can make a contribution to the Semantic Web withthese ideas. I need to find a conference to which to submit myarticle. Thank you very much again for your response.


Tim Armstrong


On 03/18/2014 09:49 AM, Andy Seaborne wrote:

Hi Tim,

The idea of this project wasn't to implement the Model interface, itwas to implement the storage level DatasetGraph interface. Jena has animplement for Model in memory (actually - for Graph : Model is apresentation of Graph and Graph (and Node and Triple) are the keyabstractions.


Aside from GSoC:

Your ideas for relating RDF access to object-oriented soundsinteresting - do you have a particular source of object-oriented datain mind?

I don't know of any closely related work which isn't to say thereisn't any. Does the work on CumulusRDF, which stores RDF molecules(if I rmember correctly) have any relevance? Or Haystacks (MIT) whichused adjacency lists on nodes to store RDF which is a different styleto the "traditional" triple storage style.

I suspect the W3C "CSV on the Web" Working Group might be connected -there, data is assumed into be in regular table structures which canbe viewed as a low level object oriented data format.


    Andy

On 18/03/14 01:27, Timothy Armstrong wrote:

Hello,

I'm interested in contributing to Jena in Google Summer of Code 2014.
I'm a computer science Ph.D. student at North Carolina State
University.  I have studied the Semantic Web very passionately, as I
feel it is a wonderful vision.  I have taken a course in it, worked as a
research assistant on the Protein Ontology project (
http://pir.georgetown.edu/pro/pro.shtml ), and developed some open
source software for it.  I have used Jena a lot.

I have some ideas for JENA-624 (
https://issues.apache.org/jira/browse/JENA-624 ), although I am very
interested in directions you see for it, and I would be glad to work on
other issues.  There are a lot of ideas I have had for my Semantic Web
software that are related to Jena.  I would be very glad to contribute
to the Jena project in GSoC, but I would also be glad to contribute
anything in my existing software that would be useful to Jena. Well, I
realize that I am a bit late posting here for GSoC, and I am hurrying to
get my software's web site and article in a presentable form.

I came up with a very simple interpretation of object-oriented
programming, similar to connections other people have made, that treats
all object-oriented data as triples in RDF.  It means in part that we
can run SPARQL queries on any object-oriented data.  I have thought it
would be very good if we could use ARQ to run SPARQL on main memory in
object-oriented programs and on object databases.  I found that we can
post object-oriented data directly on the Semantic Web without having to
write any sort of mapping like D2RQ: either by translating
object-oriented data into an existing Semantic Web format, or by setting
up SPARQL endpoints on object databases. Well, I am very interested if
you are aware if any of this has been done before.

Regarding JENA-624, I have in mind how to create implementations of the
Jena Model interface (com.hp.hpl.jena.rdf.model.Model) backed by Java
data.  I have been thinking that it might help to run SPARQL on Java
data with ARQ if we could implement Model backed by Java data. I am
wondering if you think it would be applicable to JENA-624, or to any
other issues, if we could create implementations of Model in this
manner.  There could be both in-memory models with Java data, and disk
models with object databases.

So, I would be very glad to contribute.

Thanks,
Tim Armstrong

Re: GSoC 2014 - Develop a new in-memory RDF Dataset implementation (JENA-624)

Reply via email to