On 06/12/16 15:05, Claude Warren wrote:
Looking briefly at the cumulus documentation, I am doing something
similar.  I think they will handle range scanning better than this will
with the current implementation, but I am fairly certain that can be fixed
in the future.

My plans are:

   1. Get the unit tests done
   2. Get the assembler code working
   3. Expand on the execution strategy.  For example, I think that a custom
   StageGenerator would probably help a lot.  Perhaps some join optimization
   as well.

StageGenerator is bit of a deadend for this long term.  Single graph only.

Implementing OpExecutor for this store is the same amount of work for just joins and has the possibility of more work being sent to Cassandra in the future (quads, leftjoin, simple filtering)

Subclass and override
   protected QueryIterator execute(OpBGP opBGP, QueryIterator input)

Have you considered putting values into the indexes as well as the RDF term for the object field? Then Cassandra can do some FILTERs which would enable large amounts or data to be scanned/filtered in parallel. Somthing to learn from SDB.

The other factor to drive scale, is being about to do merge joins over steaming results from Cassandra. Statement.setFetchSize seems to be the way to get repeated small blocks of results in an overall large return which is perfect. This will affect the choice of indexes.

Otherwise a parallel hash join will mean multiple CQL statements can be active at one time but that is more demanding of client-side resources.

        Andy


Claude

On Tue, Dec 6, 2016 at 2:50 PM, A. Soroka <[email protected]> wrote:

This is really neat to see, Claude! There is also:

https://github.com/cumulusrdf/cumulusrdf

which uses Cassandra to support the Sesame API. Are you using a similar
arrangement inside Cassandra?

---
A. Soroka
The University of Virginia Library

On Dec 5, 2016, at 5:42 PM, Claude Warren <[email protected]> wrote:

I have setup a quick github with the Cassandra code (such as it is).
https://github.com/Claudenw/jena-on-cassandra

I was going to work on the assembler, but I am backing off that to get
the
unit test in place first.



On Mon, Dec 5, 2016 at 9:36 AM, Claude Warren <[email protected]> wrote:

Howdy,

For those who are wondering.

I did get permission to implement and contribute the Jena on Cassandra
code.

We have a graph implementation and a DataSetGraph.  I still need to
implement the contract tests but I think the code works.

Plan is to implement an assembler and begin to look at StageGenerator
and
other classes to take advantage of the capabilities of the Cassandra
data
store.

I need to find a place to put the code (git repository).

Claude

--
I like: Like Like - The likeliest place on the web
<http://like-like.xenei.com>
LinkedIn: http://www.linkedin.com/in/claudewarren




--
I like: Like Like - The likeliest place on the web
<http://like-like.xenei.com>
LinkedIn: http://www.linkedin.com/in/claudewarren




Reply via email to