Hazelcast offers scan+filter (liek the original MySQL!) which can be a
great help, pushing filters over to the storage. Combined with sort,
then merge joins in the client (bonus: with skipping sections of index)
would be interesting.
If going this route, what about Apache Spark? More powerful operators.
Andy
On 22/01/17 16:54, A. Soroka wrote:
Another idea with which I have been playing is to try to scale horizontally but
only in-memory:
I could take the one-writeble-graph-per-transaction dataset code I've written
and replace the ConcurrentHashMap that currently holds the graphs with a
Hazelcast [1] distributed map. Naive union graph performance would be awful,
but if the workload was chiefly addressing individual graphs and the graphs
were large enough, the parallelism might be really worthwhile.
Hazelcast offers per-entry locks [2], so those could be used instead of the lockable graphs I'm
using now. It also also offers optimistic locking via Map.replace( key, oldValue, newValue ), so I
could even imagine offering a switch between "strict mode" in which locks are used and
"read-heavy mode" in which it is assumed that the application will prevent contention on
individual graphs but that an update could fail if that isn't so.
Hazelcast also offers some support for remote computation at the entries of its
distributed maps [3], so it might be possible to distribute
findInSpecificNamedGraph() executions (maybe eventually some of the ARQ
execution as well?). It also supports a kind of query language [4] that might
be used to obtain more efficiency, perhaps by using Bloom filters for graphs,
as Claude has discussed before.
All just food for thought, for now.
---
A. Soroka
[1] https://hazelcast.org/
[2]
http://docs.hazelcast.org/docs/3.7/manual/html-single/index.html#locking-maps
[3]
http://docs.hazelcast.org/docs/3.7/manual/html-single/index.html#entry-processor
[4]
http://docs.hazelcast.org/docs/3.7/manual/html-single/index.html#distributed-query
On Jan 20, 2017, at 8:38 PM, De Gyves <[email protected]> wrote:
I'd like to participate on the storage portion of Jena, maybe TDB. As I
have worked many years developing with RBDMS I like to explore new
horizonts of persistence and graph based ones seem very promising to my
next projects, so i'd like to use SPARQL and RDF with Jena/TDB and see how
far I can go.
So I've spent the last two days exploring subjects of the mail archives
from august 2015 to january of this year the of jena-dev and found some
interesting threads, as the development of TDB2, the tests of 100m of BSBM
data, a question of horizontal scaling, and that anything that implements
DatasetGraph can be used for a triples store. Some readings of jena doc
include: SPARQL, The RDF API, Txn and TDB transactions.
What I am looking for is to get a clear perspective of some requirements
which are taken for granted on a traditional RDBMS. These are:
1. Atomicity, consistency, isolation and durability of a transaction on a
single tdb database: Apart from the limitations on the documentation of TDB
Transactions and Txn, there are current issues? edge cases detected and
not yet covered?
2. Are there currently available strategies to achieve a horizontal-scaled
tdb database?
3. What do you think of try to implement a horizontal scalability with
DatasetGraph or something else with, let's say, cockroachdb, voltdb,
postgresql, etc?
4. If there are some stress tests available, e.g. I read about a 100M of
BSBM test, is it included in the src? or may I have a copy of it? I'd like
to see what the limits are of the current TDB, and maybe of TDB2: maximum
size on disk of a dataset, max number of nodes on a dataset, of models or
graphs on a dataset, the limiting behavior of a typical read/write
transaction vs. the number of nodes, datasets, etcetera. Or, some
guidelines, so I can start to create this stress code. Will it be useful to
you also?
--
Víctor-Polo de Gyvés Montero.
+52 (55) 4926 9478 (Cellphone in Mexico city)
Address: Daniel Delgadillo 7 6A, Agricultura neighborhood, Miguel Hidalgo
burough
ZIP: 11360, México City.
http://degyves.googlepages.com