Another idea with which I have been playing is to try to scale horizontally but 
only in-memory:

I could take the one-writeble-graph-per-transaction dataset code I've written 
and replace the ConcurrentHashMap that currently holds the graphs with a 
Hazelcast [1] distributed map. Naive union graph performance would be awful, 
but if the workload was chiefly addressing individual graphs and the graphs 
were large enough, the parallelism might be really worthwhile.

Hazelcast offers per-entry locks [2], so those could be used instead of the 
lockable graphs I'm using now. It also also offers optimistic locking via 
Map.replace( key, oldValue, newValue ), so I could even imagine offering a 
switch between "strict mode" in which locks are used and "read-heavy mode" in 
which it is assumed that the application will prevent contention on individual 
graphs but that an update could fail if that isn't so.

Hazelcast also offers some support for remote computation at the entries of its 
distributed maps [3], so it might be possible to distribute 
findInSpecificNamedGraph() executions (maybe eventually some of the ARQ 
execution as well?). It also supports a kind of query language [4] that might 
be used to obtain more efficiency, perhaps by using Bloom filters for graphs, 
as Claude has discussed before. 

All just food for thought, for now.

---
A. Soroka

[1] https://hazelcast.org/
[2] 
http://docs.hazelcast.org/docs/3.7/manual/html-single/index.html#locking-maps
[3] 
http://docs.hazelcast.org/docs/3.7/manual/html-single/index.html#entry-processor
[4] 
http://docs.hazelcast.org/docs/3.7/manual/html-single/index.html#distributed-query

> On Jan 20, 2017, at 8:38 PM, De Gyves <[email protected]> wrote:
> 
> I'd like to participate on the storage portion of Jena, maybe TDB. As I
> have worked many years developing with RBDMS I like to explore new
> horizonts of persistence and graph based ones seem very promising to my
> next projects, so i'd like to use SPARQL and RDF with Jena/TDB and see how
> far I can go.
> 
> So I've spent the last two days exploring subjects of the mail archives
> from august 2015 to january of this year the of jena-dev and found some
> interesting threads, as the development of TDB2, the tests of 100m of BSBM
> data, a question of horizontal scaling, and that anything that implements
> DatasetGraph can be used for a triples store. Some readings of jena doc
> include: SPARQL, The RDF API, Txn and TDB transactions.
> 
> What I am looking for is to get a clear perspective of some requirements
> which are taken for granted on a traditional RDBMS. These are:
> 
> 1. Atomicity, consistency, isolation and durability of a transaction on a
> single tdb database: Apart from the limitations on the documentation of TDB
> Transactions and Txn,  there are current issues? edge cases detected and
> not yet covered?
> 2. Are there currently available strategies to achieve a horizontal-scaled
> tdb database?
> 3. What do you think of try to implement a horizontal scalability with
> DatasetGraph or something else with, let's say, cockroachdb, voltdb,
> postgresql, etc?
> 4. If there are some stress tests available, e.g. I read about a 100M of
> BSBM test, is it included in the src? or may I have a copy of it? I'd like
> to see what the limits are of the current TDB, and maybe of TDB2: maximum
> size on disk of a dataset, max number of nodes on a dataset, of models or
> graphs on a dataset, the limiting behavior of a typical read/write
> transaction vs. the number of nodes, datasets, etcetera. Or, some
> guidelines, so I can start to create this stress code. Will it be useful to
> you also?
> 
> -- 
> Víctor-Polo de Gyvés Montero.
> +52 (55) 4926 9478 (Cellphone in Mexico city)
> Address: Daniel Delgadillo 7 6A, Agricultura neighborhood, Miguel Hidalgo
> burough
> ZIP: 11360, México City.
> 
> http://degyves.googlepages.com

Reply via email to