Re: Graph on Cassandra

Claude Warren Mon, 31 Oct 2016 07:28:22 -0700

Well, I started the process at work with Apache Jena as the target, If I
change target I have to start the process over.  Unless there is a very
strong reason to move to Commons RDF I would prefer to stay with Jena.


Given that we want to run SPARQL queries over the data I think we want to
stay with Jena.

Claude

On Mon, Oct 31, 2016 at 2:23 PM, Stian Soiland-Reyes <st...@apache.org>
wrote:

> Do you think it would make sense to do a Cassandra  Commons RDF API binding
> for Graph or Dataset..? Or would that be too high level?
>
> The streaming part would fit well there I think.
>
> Commons RDF 0.3.0 is under vote now, adding Quad, Dataset and "RDF" as the
> factory interface.
>
> https://commonsrdf.incubator.apache.org/apidocs/index.html?
> org/apache/commons/rdf/api/package-summary.html
>
> But it could make more sense as a Jena DatasetGraph so it can be used by
> sparql queries etc. (And then exposed as Commons RDF Jena bindings if one
> so wanted)
>
> On 31 Oct 2016 1:41 pm, "Claude Warren" <cla...@xenei.com> wrote:
>
> > Andy,
> >
> > This seems like a good approach but does not appear to be in the Jena
> code
> > base, which I suppose is your comment about an approach to developing
> work.
> >
> > Does it make sense to create git clones that contain the new work?  Or
> > perhaps branches?
> >
> > Do you have a suggestion or direction you would like to see this go?
> >
> > Claude
> >
> >
> >
> > On Fri, Oct 28, 2016 at 2:35 PM, Andy Seaborne <a...@apache.org> wrote:
> >
> > > Claude,
> > >
> > > These may help:
> > >
> > > I have been thinking about an interface that is more oriented to the
> > > storage than the full DatasetGraph.
> > >
> > > StorageRDF breaks down all the operations into those on the default
> graph
> > > and those on named graphs.  For just a graph, simply ignore the named
> > graph
> > > operations.
> > >
> > > https://github.com/afs/AFS-Dev/blob/master/src/main/java/pro
> > > jects/dsg2/storage/StorageRDF.java
> > >
> > > There is an adapter to the DatasetGraph hierarchy (which is needed for
> > > SPARQL):
> > >
> > > https://github.com/afs/AFS-Dev/blob/master/src/main/java/pro
> > > jects/dsg2/DatasetGraphStorage.java
> > >
> > > If you want to only use existing classes, DatasetGraphTriplesQuads is
> the
> > > place to start - used by TIM and TDB - yuo can implement without
> needing
> > > quads/named graphs. Again, simply ignore (throw
> > > UnsupportedOperationException for the named graph calls).
> > >
> > > Going the graph route could lead to rework later on for any kind of
> > > performance issues because find(S,P,O) is so narrow and precludes union
> > > default graph except by brute force.  DatasetGraph work with the SPARQL
> > > execution engine.
> > >
> > > We still need to discuss how best to approach developing work - it
> should
> > > not get sucked up by the release cycle.
> > >
> > >         Andy
> > >
> > >
> > > On 26/10/16 19:21, Claude Warren wrote:
> > >
> > >> My plan is to start with a Graph implementation.  We expect to write 3
> > >> tables: SPO, POS, OPS (I think).  Currently we don't have an easy way
> to
> > >> handle find( ANY, ANY, ANY) so I suspect we will just start with
> > >> permitting
> > >> a column scan on Cassandra.
> > >>
> > >> I have not looked at DynamoDB but as I recall there are significant
> > >> differences under the hood.
> > >>
> > >> I expect that we will move on to a custom model or query engine to get
> > the
> > >> best performance but that is not what we are planning for the first
> cut.
> > >>
> > >> I am still waiting for management approval to do this at work ....
> > >> sometimes it takes longer to get the paperwork done than it does to
> > design
> > >> the thing.
> > >>
> > >>
> > >> Claude
> > >>
> > >> On Mon, Oct 17, 2016 at 6:39 PM, Paul Houle <paul.ho...@ontology2.com
> >
> > >> wrote:
> > >>
> > >> I like DynamoDB as a target for this sort of thing.  There are many
> > >>> tasks which are small-scale yet critical where it would otherwise be
> > >>> hard to provide a distributed and reliable database.  Put that
> together
> > >>> with Lambda,  which does the same for computation,  and you are
> cooking
> > >>> with gas.
> > >>>
> > >>> I wrote a 1-1 translation of DynamoDB documents to RDF that I use
> > >>> throughout an application;  the code is DynamoDB idiomatic in every
> > way,
> > >>>  just the application reads and writes (a constrained set of) RDF
> > >>> documents.
> > >>>
> > >>> Right now I dump the documents from the DynamoDB system into a triple
> > >>> store when I want a panoptic view,  but with a distributed graph like
> > >>> that would mean being able to run SPARQL queries against DynamoDB
> > >>> directly.
> > >>>
> > >>> There are many products in the same family as Cassandra and DynamoDB
> > and
> > >>> it would be good to think through the math so we can approach them
> all
> > >>> in a similar way.
> > >>>
> > >>> --
> > >>>   Paul Houle
> > >>>   paul.ho...@ontology2.com
> > >>>
> > >>> On Mon, Oct 17, 2016, at 12:31 PM, A. Soroka wrote:
> > >>>
> > >>>> Yep,
> > >>>>
> > >>>> http://iswc2011.semanticweb.org/fileadmin/iswc/Papers/
> > >>>>
> > >>> Workshops/SSWS/Ladwig-et-all-SSWS2011.pdf
> > >>>
> > >>>>
> > >>>> indicates that they are indexing by subject. As someone who has
> > >>>> implemented LDP, that is definitely the approach that makes sense
> > there.
> > >>>>
> > >>>> ---
> > >>>> A. Soroka
> > >>>> The University of Virginia Library
> > >>>>
> > >>>> On Oct 17, 2016, at 12:20 PM, Andy Seaborne <a...@apache.org>
> wrote:
> > >>>>>
> > >>>>> IIRC It stores CBDs indexed by subject so it is the "other" model
> to
> > >>>>>
> > >>>> Rya.  Better for LDP (??).
> > >>>
> > >>>>
> > >>>>>     Andy
> > >>>>>
> > >>>>> On 17/10/16 15:41, A. Soroka wrote:
> > >>>>>
> > >>>>>> There's also:
> > >>>>>>
> > >>>>>> https://github.com/cumulusrdf/cumulusrdf
> > >>>>>>
> > >>>>>> in a similar vein (RDF over Cassandra). Not sure what kind of
> > >>>>>>
> > >>>>> particular uses it expects to support.
> > >>>
> > >>>>
> > >>>>>> ---
> > >>>>>> A. Soroka
> > >>>>>> The University of Virginia Library
> > >>>>>>
> > >>>>>> On Oct 17, 2016, at 7:02 AM, Andy Seaborne <a...@apache.org>
> wrote:
> > >>>>>>>
> > >>>>>>> Hi Claude,
> > >>>>>>>
> > >>>>>>> There is certainly interest from me.
> > >>>>>>>
> > >>>>>>> What the best thing to do depends on various factors.  By putting
> > it
> > >>>>>>>
> > >>>>>> in extras I presume you mean it gets added to the release?  That
> is
> > >>> not the
> > >>> only way forward.
> > >>>
> > >>>>
> > >>>>>>> An important aspect of Apache is "Community over code" - will
> there
> > >>>>>>>
> > >>>>>> be a community around this code?  Is that community the same, or
> > >>> significant overlap, as the Jena community?
> > >>>
> > >>>>
> > >>>>>>> There are various reasons for wanting RDF over a column store -
> > >>>>>>>
> > >>>>>> which use cases are the most important for this work?
> > >>>
> > >>>>
> > >>>>>>> They lead to different ways of using Cassandra. For example,
> > >>>>>>>
> > >>>>>> Rya(incubating) uses Accumulo tables as indexes, and partial scans
> > of
> > >>> the
> > >>> table is streaming.  Other systems try to use the columns for
> > properties,
> > >>> possibly more useful for LDP style than SPARQL.
> > >>>
> > >>>>
> > >>>>>>>   Andy
> > >>>>>>>
> > >>>>>>> On 15/10/16 18:38, Claude Warren wrote:
> > >>>>>>>
> > >>>>>>>> Howdy,
> > >>>>>>>>
> > >>>>>>>> We have a project at work that is implementing Jena Graph on
> > >>>>>>>>
> > >>>>>>> Cassandra.  I
> > >>>
> > >>>> am wondering if there is enough interest here to accept it as a
> > >>>>>>>> contribution.  I was thinking that it might fit in the Extras
> > >>>>>>>>
> > >>>>>>> category.
> > >>>
> > >>>>
> > >>>>>>>> I can not promise release of the code yet as I have to present
> it
> > >>>>>>>>
> > >>>>>>> to our
> > >>>
> > >>>> internal Intellectual Property group first.
> > >>>>>>>>
> > >>>>>>>> Thoughts?
> > >>>>>>>>
> > >>>>>>>> Claude
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>
> > >>>>
> > >>>
> > >>
> > >>
> > >>
> >
> >
> > --
> > I like: Like Like - The likeliest place on the web
> > <http://like-like.xenei.com>
> > LinkedIn: http://www.linkedin.com/in/claudewarren
> >
>



-- 
I like: Like Like - The likeliest place on the web
<http://like-like.xenei.com>
LinkedIn: http://www.linkedin.com/in/claudewarren

Re: Graph on Cassandra

Reply via email to