Re: Jena on Cassandra - status

Andy Seaborne Wed, 07 Dec 2016 14:00:06 -0800


On 06/12/16 15:05, Claude Warren wrote:

Looking briefly at the cumulus documentation, I am doing something
similar.  I think they will handle range scanning better than this will
with the current implementation, but I am fairly certain that can be fixed
in the future.

My plans are:

   1. Get the unit tests done
   2. Get the assembler code working
   3. Expand on the execution strategy.  For example, I think that a custom
   StageGenerator would probably help a lot.  Perhaps some join optimization
   as well.


StageGenerator is bit of a deadend for this long term.  Single graph only.

Implementing OpExecutor for this store is the same amount of work forjust joins and has the possibility of more work being sent to Cassandrain the future (quads, leftjoin, simple filtering)


Subclass and override
   protected QueryIterator execute(OpBGP opBGP, QueryIterator input)

Have you considered putting values into the indexes as well as the RDFterm for the object field? Then Cassandra can do some FILTERs whichwould enable large amounts or data to be scanned/filtered in parallel.Somthing to learn from SDB.

The other factor to drive scale, is being about to do merge joins oversteaming results from Cassandra. Statement.setFetchSize seems to be theway to get repeated small blocks of results in an overall large returnwhich is perfect. This will affect the choice of indexes.

Otherwise a parallel hash join will mean multiple CQL statements can beactive at one time but that is more demanding of client-side resources.


        Andy


Claude

On Tue, Dec 6, 2016 at 2:50 PM, A. Soroka <[email protected]> wrote:

This is really neat to see, Claude! There is also:

https://github.com/cumulusrdf/cumulusrdf

which uses Cassandra to support the Sesame API. Are you using a similar
arrangement inside Cassandra?

---
A. Soroka
The University of Virginia Library

On Dec 5, 2016, at 5:42 PM, Claude Warren <[email protected]> wrote:

I have setup a quick github with the Cassandra code (such as it is).
https://github.com/Claudenw/jena-on-cassandra

I was going to work on the assembler, but I am backing off that to get

the

unit test in place first.



On Mon, Dec 5, 2016 at 9:36 AM, Claude Warren <[email protected]> wrote:

Howdy,

For those who are wondering.

I did get permission to implement and contribute the Jena on Cassandra
code.

We have a graph implementation and a DataSetGraph.  I still need to
implement the contract tests but I think the code works.

Plan is to implement an assembler and begin to look at StageGenerator

and

other classes to take advantage of the capabilities of the Cassandra

data

store.

I need to find a place to put the code (git repository).

Claude

--
I like: Like Like - The likeliest place on the web
<http://like-like.xenei.com>
LinkedIn: http://www.linkedin.com/in/claudewarren




--
I like: Like Like - The likeliest place on the web
<http://like-like.xenei.com>
LinkedIn: http://www.linkedin.com/in/claudewarren

Re: Jena on Cassandra - status

Reply via email to