Re: Graph on Cassandra

Stian Soiland-Reyes Mon, 31 Oct 2016 07:24:02 -0700

Do you think it would make sense to do a Cassandra  Commons RDF API binding
for Graph or Dataset..? Or would that be too high level?


The streaming part would fit well there I think.

Commons RDF 0.3.0 is under vote now, adding Quad, Dataset and "RDF" as the
factory interface.

https://commonsrdf.incubator.apache.org/apidocs/index.html?org/apache/commons/rdf/api/package-summary.html

But it could make more sense as a Jena DatasetGraph so it can be used by
sparql queries etc. (And then exposed as Commons RDF Jena bindings if one
so wanted)

On 31 Oct 2016 1:41 pm, "Claude Warren" <[email protected]> wrote:

> Andy,
>
> This seems like a good approach but does not appear to be in the Jena code
> base, which I suppose is your comment about an approach to developing work.
>
> Does it make sense to create git clones that contain the new work?  Or
> perhaps branches?
>
> Do you have a suggestion or direction you would like to see this go?
>
> Claude
>
>
>
> On Fri, Oct 28, 2016 at 2:35 PM, Andy Seaborne <[email protected]> wrote:
>
> > Claude,
> >
> > These may help:
> >
> > I have been thinking about an interface that is more oriented to the
> > storage than the full DatasetGraph.
> >
> > StorageRDF breaks down all the operations into those on the default graph
> > and those on named graphs.  For just a graph, simply ignore the named
> graph
> > operations.
> >
> > https://github.com/afs/AFS-Dev/blob/master/src/main/java/pro
> > jects/dsg2/storage/StorageRDF.java
> >
> > There is an adapter to the DatasetGraph hierarchy (which is needed for
> > SPARQL):
> >
> > https://github.com/afs/AFS-Dev/blob/master/src/main/java/pro
> > jects/dsg2/DatasetGraphStorage.java
> >
> > If you want to only use existing classes, DatasetGraphTriplesQuads is the
> > place to start - used by TIM and TDB - yuo can implement without needing
> > quads/named graphs. Again, simply ignore (throw
> > UnsupportedOperationException for the named graph calls).
> >
> > Going the graph route could lead to rework later on for any kind of
> > performance issues because find(S,P,O) is so narrow and precludes union
> > default graph except by brute force.  DatasetGraph work with the SPARQL
> > execution engine.
> >
> > We still need to discuss how best to approach developing work - it should
> > not get sucked up by the release cycle.
> >
> >         Andy
> >
> >
> > On 26/10/16 19:21, Claude Warren wrote:
> >
> >> My plan is to start with a Graph implementation.  We expect to write 3
> >> tables: SPO, POS, OPS (I think).  Currently we don't have an easy way to
> >> handle find( ANY, ANY, ANY) so I suspect we will just start with
> >> permitting
> >> a column scan on Cassandra.
> >>
> >> I have not looked at DynamoDB but as I recall there are significant
> >> differences under the hood.
> >>
> >> I expect that we will move on to a custom model or query engine to get
> the
> >> best performance but that is not what we are planning for the first cut.
> >>
> >> I am still waiting for management approval to do this at work ....
> >> sometimes it takes longer to get the paperwork done than it does to
> design
> >> the thing.
> >>
> >>
> >> Claude
> >>
> >> On Mon, Oct 17, 2016 at 6:39 PM, Paul Houle <[email protected]>
> >> wrote:
> >>
> >> I like DynamoDB as a target for this sort of thing.  There are many
> >>> tasks which are small-scale yet critical where it would otherwise be
> >>> hard to provide a distributed and reliable database.  Put that together
> >>> with Lambda,  which does the same for computation,  and you are cooking
> >>> with gas.
> >>>
> >>> I wrote a 1-1 translation of DynamoDB documents to RDF that I use
> >>> throughout an application;  the code is DynamoDB idiomatic in every
> way,
> >>>  just the application reads and writes (a constrained set of) RDF
> >>> documents.
> >>>
> >>> Right now I dump the documents from the DynamoDB system into a triple
> >>> store when I want a panoptic view,  but with a distributed graph like
> >>> that would mean being able to run SPARQL queries against DynamoDB
> >>> directly.
> >>>
> >>> There are many products in the same family as Cassandra and DynamoDB
> and
> >>> it would be good to think through the math so we can approach them all
> >>> in a similar way.
> >>>
> >>> --
> >>>   Paul Houle
> >>>   [email protected]
> >>>
> >>> On Mon, Oct 17, 2016, at 12:31 PM, A. Soroka wrote:
> >>>
> >>>> Yep,
> >>>>
> >>>> http://iswc2011.semanticweb.org/fileadmin/iswc/Papers/
> >>>>
> >>> Workshops/SSWS/Ladwig-et-all-SSWS2011.pdf
> >>>
> >>>>
> >>>> indicates that they are indexing by subject. As someone who has
> >>>> implemented LDP, that is definitely the approach that makes sense
> there.
> >>>>
> >>>> ---
> >>>> A. Soroka
> >>>> The University of Virginia Library
> >>>>
> >>>> On Oct 17, 2016, at 12:20 PM, Andy Seaborne <[email protected]> wrote:
> >>>>>
> >>>>> IIRC It stores CBDs indexed by subject so it is the "other" model to
> >>>>>
> >>>> Rya.  Better for LDP (??).
> >>>
> >>>>
> >>>>>     Andy
> >>>>>
> >>>>> On 17/10/16 15:41, A. Soroka wrote:
> >>>>>
> >>>>>> There's also:
> >>>>>>
> >>>>>> https://github.com/cumulusrdf/cumulusrdf
> >>>>>>
> >>>>>> in a similar vein (RDF over Cassandra). Not sure what kind of
> >>>>>>
> >>>>> particular uses it expects to support.
> >>>
> >>>>
> >>>>>> ---
> >>>>>> A. Soroka
> >>>>>> The University of Virginia Library
> >>>>>>
> >>>>>> On Oct 17, 2016, at 7:02 AM, Andy Seaborne <[email protected]> wrote:
> >>>>>>>
> >>>>>>> Hi Claude,
> >>>>>>>
> >>>>>>> There is certainly interest from me.
> >>>>>>>
> >>>>>>> What the best thing to do depends on various factors.  By putting
> it
> >>>>>>>
> >>>>>> in extras I presume you mean it gets added to the release?  That is
> >>> not the
> >>> only way forward.
> >>>
> >>>>
> >>>>>>> An important aspect of Apache is "Community over code" - will there
> >>>>>>>
> >>>>>> be a community around this code?  Is that community the same, or
> >>> significant overlap, as the Jena community?
> >>>
> >>>>
> >>>>>>> There are various reasons for wanting RDF over a column store -
> >>>>>>>
> >>>>>> which use cases are the most important for this work?
> >>>
> >>>>
> >>>>>>> They lead to different ways of using Cassandra. For example,
> >>>>>>>
> >>>>>> Rya(incubating) uses Accumulo tables as indexes, and partial scans
> of
> >>> the
> >>> table is streaming.  Other systems try to use the columns for
> properties,
> >>> possibly more useful for LDP style than SPARQL.
> >>>
> >>>>
> >>>>>>>   Andy
> >>>>>>>
> >>>>>>> On 15/10/16 18:38, Claude Warren wrote:
> >>>>>>>
> >>>>>>>> Howdy,
> >>>>>>>>
> >>>>>>>> We have a project at work that is implementing Jena Graph on
> >>>>>>>>
> >>>>>>> Cassandra.  I
> >>>
> >>>> am wondering if there is enough interest here to accept it as a
> >>>>>>>> contribution.  I was thinking that it might fit in the Extras
> >>>>>>>>
> >>>>>>> category.
> >>>
> >>>>
> >>>>>>>> I can not promise release of the code yet as I have to present it
> >>>>>>>>
> >>>>>>> to our
> >>>
> >>>> internal Intellectual Property group first.
> >>>>>>>>
> >>>>>>>> Thoughts?
> >>>>>>>>
> >>>>>>>> Claude
> >>>>>>>>
> >>>>>>>>
> >>>>>>
> >>>>
> >>>
> >>
> >>
> >>
>
>
> --
> I like: Like Like - The likeliest place on the web
> <http://like-like.xenei.com>
> LinkedIn: http://www.linkedin.com/in/claudewarren
>

Re: Graph on Cassandra

Reply via email to