Re: in-memory scale-out with Hazelcast Was: Horizontal scalability and limits of TDB

Claude Warren Sun, 22 Jan 2017 10:40:04 -0800

If have a bloom filter based graph implementation that runs on mysql as
there needs to be a server side bloom filter check and I could easily
implement a User defined function there.


I also have a new implementation of bloom filter that uses murmur 128 and
creates proto-bloom filters that can be expressed as bloom filters of any
shape.

Last weekend implemented a bloom filter join (multi level bloom filters --
so different shapes) but I have not tested it at scale and I am no certain
that it is any  better than the standard hash join.  However, it should be
possible to page out parts of the table so that it can do very large joins
efficiently.  But that got me wondering how big are "really large" joins?
even with millions of triples it seems to me that the queries will only
have thousands of entries to join.

Claude

On Sun, Jan 22, 2017 at 4:54 PM, A. Soroka <[email protected]> wrote:

> Another idea with which I have been playing is to try to scale
> horizontally but only in-memory:
>
> I could take the one-writeble-graph-per-transaction dataset code I've
> written and replace the ConcurrentHashMap that currently holds the graphs
> with a Hazelcast [1] distributed map. Naive union graph performance would
> be awful, but if the workload was chiefly addressing individual graphs and
> the graphs were large enough, the parallelism might be really worthwhile.
>
> Hazelcast offers per-entry locks [2], so those could be used instead of
> the lockable graphs I'm using now. It also also offers optimistic locking
> via Map.replace( key, oldValue, newValue ), so I could even imagine
> offering a switch between "strict mode" in which locks are used and
> "read-heavy mode" in which it is assumed that the application will prevent
> contention on individual graphs but that an update could fail if that isn't
> so.
>
> Hazelcast also offers some support for remote computation at the entries
> of its distributed maps [3], so it might be possible to distribute
> findInSpecificNamedGraph() executions (maybe eventually some of the ARQ
> execution as well?). It also supports a kind of query language [4] that
> might be used to obtain more efficiency, perhaps by using Bloom filters for
> graphs, as Claude has discussed before.
>
> All just food for thought, for now.
>
> ---
> A. Soroka
>
> [1] https://hazelcast.org/
> [2] http://docs.hazelcast.org/docs/3.7/manual/html-single/
> index.html#locking-maps
> [3] http://docs.hazelcast.org/docs/3.7/manual/html-single/
> index.html#entry-processor
> [4] http://docs.hazelcast.org/docs/3.7/manual/html-single/
> index.html#distributed-query
>
> > On Jan 20, 2017, at 8:38 PM, De Gyves <[email protected]> wrote:
> >
> > I'd like to participate on the storage portion of Jena, maybe TDB. As I
> > have worked many years developing with RBDMS I like to explore new
> > horizonts of persistence and graph based ones seem very promising to my
> > next projects, so i'd like to use SPARQL and RDF with Jena/TDB and see
> how
> > far I can go.
> >
> > So I've spent the last two days exploring subjects of the mail archives
> > from august 2015 to january of this year the of jena-dev and found some
> > interesting threads, as the development of TDB2, the tests of 100m of
> BSBM
> > data, a question of horizontal scaling, and that anything that implements
> > DatasetGraph can be used for a triples store. Some readings of jena doc
> > include: SPARQL, The RDF API, Txn and TDB transactions.
> >
> > What I am looking for is to get a clear perspective of some requirements
> > which are taken for granted on a traditional RDBMS. These are:
> >
> > 1. Atomicity, consistency, isolation and durability of a transaction on a
> > single tdb database: Apart from the limitations on the documentation of
> TDB
> > Transactions and Txn,  there are current issues? edge cases detected and
> > not yet covered?
> > 2. Are there currently available strategies to achieve a
> horizontal-scaled
> > tdb database?
> > 3. What do you think of try to implement a horizontal scalability with
> > DatasetGraph or something else with, let's say, cockroachdb, voltdb,
> > postgresql, etc?
> > 4. If there are some stress tests available, e.g. I read about a 100M of
> > BSBM test, is it included in the src? or may I have a copy of it? I'd
> like
> > to see what the limits are of the current TDB, and maybe of TDB2: maximum
> > size on disk of a dataset, max number of nodes on a dataset, of models or
> > graphs on a dataset, the limiting behavior of a typical read/write
> > transaction vs. the number of nodes, datasets, etcetera. Or, some
> > guidelines, so I can start to create this stress code. Will it be useful
> to
> > you also?
> >
> > --
> > Víctor-Polo de Gyvés Montero.
> > +52 (55) 4926 9478 (Cellphone in Mexico city)
> > Address: Daniel Delgadillo 7 6A, Agricultura neighborhood, Miguel Hidalgo
> > burough
> > ZIP: 11360, México City.
> >
> > http://degyves.googlepages.com
>
>


-- 
I like: Like Like - The likeliest place on the web
<http://like-like.xenei.com>
LinkedIn: http://www.linkedin.com/in/claudewarren

Re: in-memory scale-out with Hazelcast Was: Horizontal scalability and limits of TDB

Reply via email to