Re: Graph on Cassandra

Claude Warren Mon, 31 Oct 2016 06:42:45 -0700

Andy,

This seems like a good approach but does not appear to be in the Jena code
base, which I suppose is your comment about an approach to developing work.


Does it make sense to create git clones that contain the new work?  Or
perhaps branches?

Do you have a suggestion or direction you would like to see this go?

Claude



On Fri, Oct 28, 2016 at 2:35 PM, Andy Seaborne <a...@apache.org> wrote:

> Claude,
>
> These may help:
>
> I have been thinking about an interface that is more oriented to the
> storage than the full DatasetGraph.
>
> StorageRDF breaks down all the operations into those on the default graph
> and those on named graphs.  For just a graph, simply ignore the named graph
> operations.
>
> https://github.com/afs/AFS-Dev/blob/master/src/main/java/pro
> jects/dsg2/storage/StorageRDF.java
>
> There is an adapter to the DatasetGraph hierarchy (which is needed for
> SPARQL):
>
> https://github.com/afs/AFS-Dev/blob/master/src/main/java/pro
> jects/dsg2/DatasetGraphStorage.java
>
> If you want to only use existing classes, DatasetGraphTriplesQuads is the
> place to start - used by TIM and TDB - yuo can implement without needing
> quads/named graphs. Again, simply ignore (throw
> UnsupportedOperationException for the named graph calls).
>
> Going the graph route could lead to rework later on for any kind of
> performance issues because find(S,P,O) is so narrow and precludes union
> default graph except by brute force.  DatasetGraph work with the SPARQL
> execution engine.
>
> We still need to discuss how best to approach developing work - it should
> not get sucked up by the release cycle.
>
>         Andy
>
>
> On 26/10/16 19:21, Claude Warren wrote:
>
>> My plan is to start with a Graph implementation.  We expect to write 3
>> tables: SPO, POS, OPS (I think).  Currently we don't have an easy way to
>> handle find( ANY, ANY, ANY) so I suspect we will just start with
>> permitting
>> a column scan on Cassandra.
>>
>> I have not looked at DynamoDB but as I recall there are significant
>> differences under the hood.
>>
>> I expect that we will move on to a custom model or query engine to get the
>> best performance but that is not what we are planning for the first cut.
>>
>> I am still waiting for management approval to do this at work ....
>> sometimes it takes longer to get the paperwork done than it does to design
>> the thing.
>>
>>
>> Claude
>>
>> On Mon, Oct 17, 2016 at 6:39 PM, Paul Houle <paul.ho...@ontology2.com>
>> wrote:
>>
>> I like DynamoDB as a target for this sort of thing.  There are many
>>> tasks which are small-scale yet critical where it would otherwise be
>>> hard to provide a distributed and reliable database.  Put that together
>>> with Lambda,  which does the same for computation,  and you are cooking
>>> with gas.
>>>
>>> I wrote a 1-1 translation of DynamoDB documents to RDF that I use
>>> throughout an application;  the code is DynamoDB idiomatic in every way,
>>>  just the application reads and writes (a constrained set of) RDF
>>> documents.
>>>
>>> Right now I dump the documents from the DynamoDB system into a triple
>>> store when I want a panoptic view,  but with a distributed graph like
>>> that would mean being able to run SPARQL queries against DynamoDB
>>> directly.
>>>
>>> There are many products in the same family as Cassandra and DynamoDB and
>>> it would be good to think through the math so we can approach them all
>>> in a similar way.
>>>
>>> --
>>>   Paul Houle
>>>   paul.ho...@ontology2.com
>>>
>>> On Mon, Oct 17, 2016, at 12:31 PM, A. Soroka wrote:
>>>
>>>> Yep,
>>>>
>>>> http://iswc2011.semanticweb.org/fileadmin/iswc/Papers/
>>>>
>>> Workshops/SSWS/Ladwig-et-all-SSWS2011.pdf
>>>
>>>>
>>>> indicates that they are indexing by subject. As someone who has
>>>> implemented LDP, that is definitely the approach that makes sense there.
>>>>
>>>> ---
>>>> A. Soroka
>>>> The University of Virginia Library
>>>>
>>>> On Oct 17, 2016, at 12:20 PM, Andy Seaborne <a...@apache.org> wrote:
>>>>>
>>>>> IIRC It stores CBDs indexed by subject so it is the "other" model to
>>>>>
>>>> Rya.  Better for LDP (??).
>>>
>>>>
>>>>>     Andy
>>>>>
>>>>> On 17/10/16 15:41, A. Soroka wrote:
>>>>>
>>>>>> There's also:
>>>>>>
>>>>>> https://github.com/cumulusrdf/cumulusrdf
>>>>>>
>>>>>> in a similar vein (RDF over Cassandra). Not sure what kind of
>>>>>>
>>>>> particular uses it expects to support.
>>>
>>>>
>>>>>> ---
>>>>>> A. Soroka
>>>>>> The University of Virginia Library
>>>>>>
>>>>>> On Oct 17, 2016, at 7:02 AM, Andy Seaborne <a...@apache.org> wrote:
>>>>>>>
>>>>>>> Hi Claude,
>>>>>>>
>>>>>>> There is certainly interest from me.
>>>>>>>
>>>>>>> What the best thing to do depends on various factors.  By putting it
>>>>>>>
>>>>>> in extras I presume you mean it gets added to the release?  That is
>>> not the
>>> only way forward.
>>>
>>>>
>>>>>>> An important aspect of Apache is "Community over code" - will there
>>>>>>>
>>>>>> be a community around this code?  Is that community the same, or
>>> significant overlap, as the Jena community?
>>>
>>>>
>>>>>>> There are various reasons for wanting RDF over a column store -
>>>>>>>
>>>>>> which use cases are the most important for this work?
>>>
>>>>
>>>>>>> They lead to different ways of using Cassandra. For example,
>>>>>>>
>>>>>> Rya(incubating) uses Accumulo tables as indexes, and partial scans of
>>> the
>>> table is streaming.  Other systems try to use the columns for properties,
>>> possibly more useful for LDP style than SPARQL.
>>>
>>>>
>>>>>>>   Andy
>>>>>>>
>>>>>>> On 15/10/16 18:38, Claude Warren wrote:
>>>>>>>
>>>>>>>> Howdy,
>>>>>>>>
>>>>>>>> We have a project at work that is implementing Jena Graph on
>>>>>>>>
>>>>>>> Cassandra.  I
>>>
>>>> am wondering if there is enough interest here to accept it as a
>>>>>>>> contribution.  I was thinking that it might fit in the Extras
>>>>>>>>
>>>>>>> category.
>>>
>>>>
>>>>>>>> I can not promise release of the code yet as I have to present it
>>>>>>>>
>>>>>>> to our
>>>
>>>> internal Intellectual Property group first.
>>>>>>>>
>>>>>>>> Thoughts?
>>>>>>>>
>>>>>>>> Claude
>>>>>>>>
>>>>>>>>
>>>>>>
>>>>
>>>
>>
>>
>>


-- 
I like: Like Like - The likeliest place on the web
<http://like-like.xenei.com>
LinkedIn: http://www.linkedin.com/in/claudewarren

Re: Graph on Cassandra

Reply via email to