Re: in-memory scale-out with Hazelcast Was: Horizontal scalability and limits of TDB

Claude Warren Tue, 24 Jan 2017 02:21:43 -0800

First some background:  I have spent most of the last 1.5 years looking at
bloom filters and their properties.  I have several software patents filed
by IBM.  I am not a fan of software patents but they pay me for them.


What I wanted to explore was could I reduce the number of indexes by using
bloom filters?  It is possible to build a system that uses bloom filters
and indexes them using a single index.  However, the overall speed is not
much better than the standard SDB implementation, though it does scale
better.

Bloom filters have a specific "shape" (
http://hur.st/bloomfilter?n=4&p=1.0E-20) defined by the number of bits and
the number of hash functions or the number of items and the probability of
collision.

In the process I needed to be able to take a triple and build a bloom
filter that represented the triple itself as well as a bloom filter that
represented each node in the triple.  So I created a bloom filter that had
3 elements and a low probability of collision.  This filter represented the
triple.  Each triple was placed on a "page".  The page had a bloom filter
to determine if the page contained the triple.  Now, the problem here is
that while you can put multiple filters of the same shape into a single
filter that filter quickly becomes overloaded and returns true for most
items.  This is why the Bloofi (
https://www.usna.edu/Users/cs/adina/research/Bloofi%20_CloudI2013.pdf)
index performs worse than a linear search for collections of 1000+ filters.

I looked at the Cassandra bloom filter code and realised that you can split
the filter into two parts: a proto filter and a concrete filter.

By using the murmur 128 hash function to encode the items to be placed in
the filter you produce two long values.  These are the proto filter.

Then using the number of bits and number of hash functions that proto
filter can be converted into equiavalent concrete filters of different
shapes.  So the page filters could be much larger but still contain the
data from the smaller triple filters.

So the bloom filter implementation I have is pages of filters fronting
triples.  This was all stored on a single system.  Now it should be
possible to consider each remote system as a "page" and perform the queries
as appropriatly.


Claude

On Sun, Jan 22, 2017 at 7:50 PM, A. Soroka <[email protected]> wrote:

> I'm not sure how you are using those filters, Claude, but my idea (not
> specific at all to Hazelcast) would be to use Bloom (or maybe "cuckoo"
> filters) at the server that receives a query or some subtree in a query and
> expects to distribute subtrees or branches to some other servers. The
> filters (associated to other partitions or servers) might prevent
> unnecessary network traffic to servers that don't have any relevant info.
> As I understand such schemes, it's only when the cost of retrieval is high
> enough (e.g. network or hard disk or _really_ inefficient data
> representations) that the extra computations involved in  filters are worth
> it.
>
> Is that like what you are doing with your implementation?
>
> ---
> A. Soroka
>
>
> > On Jan 22, 2017, at 1:39 PM, Claude Warren <[email protected]> wrote:
> >
> > If have a bloom filter based graph implementation that runs on mysql as
> > there needs to be a server side bloom filter check and I could easily
> > implement a User defined function there.
> >
> > I also have a new implementation of bloom filter that uses murmur 128 and
> > creates proto-bloom filters that can be expressed as bloom filters of any
> > shape.
> >
> > Last weekend implemented a bloom filter join (multi level bloom filters
> --
> > so different shapes) but I have not tested it at scale and I am no
> certain
> > that it is any  better than the standard hash join.  However, it should
> be
> > possible to page out parts of the table so that it can do very large
> joins
> > efficiently.  But that got me wondering how big are "really large" joins?
> > even with millions of triples it seems to me that the queries will only
> > have thousands of entries to join.
> >
> > Claude
> >
> > On Sun, Jan 22, 2017 at 4:54 PM, A. Soroka <[email protected]> wrote:
> >
> >> Another idea with which I have been playing is to try to scale
> >> horizontally but only in-memory:
> >>
> >> I could take the one-writeble-graph-per-transaction dataset code I've
> >> written and replace the ConcurrentHashMap that currently holds the
> graphs
> >> with a Hazelcast [1] distributed map. Naive union graph performance
> would
> >> be awful, but if the workload was chiefly addressing individual graphs
> and
> >> the graphs were large enough, the parallelism might be really
> worthwhile.
> >>
> >> Hazelcast offers per-entry locks [2], so those could be used instead of
> >> the lockable graphs I'm using now. It also also offers optimistic
> locking
> >> via Map.replace( key, oldValue, newValue ), so I could even imagine
> >> offering a switch between "strict mode" in which locks are used and
> >> "read-heavy mode" in which it is assumed that the application will
> prevent
> >> contention on individual graphs but that an update could fail if that
> isn't
> >> so.
> >>
> >> Hazelcast also offers some support for remote computation at the entries
> >> of its distributed maps [3], so it might be possible to distribute
> >> findInSpecificNamedGraph() executions (maybe eventually some of the ARQ
> >> execution as well?). It also supports a kind of query language [4] that
> >> might be used to obtain more efficiency, perhaps by using Bloom filters
> for
> >> graphs, as Claude has discussed before.
> >>
> >> All just food for thought, for now.
> >>
> >> ---
> >> A. Soroka
> >>
> >> [1] https://hazelcast.org/
> >> [2] http://docs.hazelcast.org/docs/3.7/manual/html-single/
> >> index.html#locking-maps
> >> [3] http://docs.hazelcast.org/docs/3.7/manual/html-single/
> >> index.html#entry-processor
> >> [4] http://docs.hazelcast.org/docs/3.7/manual/html-single/
> >> index.html#distributed-query
> >>
> >>> On Jan 20, 2017, at 8:38 PM, De Gyves <[email protected]> wrote:
> >>>
> >>> I'd like to participate on the storage portion of Jena, maybe TDB. As I
> >>> have worked many years developing with RBDMS I like to explore new
> >>> horizonts of persistence and graph based ones seem very promising to my
> >>> next projects, so i'd like to use SPARQL and RDF with Jena/TDB and see
> >> how
> >>> far I can go.
> >>>
> >>> So I've spent the last two days exploring subjects of the mail archives
> >>> from august 2015 to january of this year the of jena-dev and found some
> >>> interesting threads, as the development of TDB2, the tests of 100m of
> >> BSBM
> >>> data, a question of horizontal scaling, and that anything that
> implements
> >>> DatasetGraph can be used for a triples store. Some readings of jena doc
> >>> include: SPARQL, The RDF API, Txn and TDB transactions.
> >>>
> >>> What I am looking for is to get a clear perspective of some
> requirements
> >>> which are taken for granted on a traditional RDBMS. These are:
> >>>
> >>> 1. Atomicity, consistency, isolation and durability of a transaction
> on a
> >>> single tdb database: Apart from the limitations on the documentation of
> >> TDB
> >>> Transactions and Txn,  there are current issues? edge cases detected
> and
> >>> not yet covered?
> >>> 2. Are there currently available strategies to achieve a
> >> horizontal-scaled
> >>> tdb database?
> >>> 3. What do you think of try to implement a horizontal scalability with
> >>> DatasetGraph or something else with, let's say, cockroachdb, voltdb,
> >>> postgresql, etc?
> >>> 4. If there are some stress tests available, e.g. I read about a 100M
> of
> >>> BSBM test, is it included in the src? or may I have a copy of it? I'd
> >> like
> >>> to see what the limits are of the current TDB, and maybe of TDB2:
> maximum
> >>> size on disk of a dataset, max number of nodes on a dataset, of models
> or
> >>> graphs on a dataset, the limiting behavior of a typical read/write
> >>> transaction vs. the number of nodes, datasets, etcetera. Or, some
> >>> guidelines, so I can start to create this stress code. Will it be
> useful
> >> to
> >>> you also?
> >>>
> >>> --
> >>> Víctor-Polo de Gyvés Montero.
> >>> +52 (55) 4926 9478 (Cellphone in Mexico city)
> >>> Address: Daniel Delgadillo 7 6A, Agricultura neighborhood, Miguel
> Hidalgo
> >>> burough
> >>> ZIP: 11360, México City.
> >>>
> >>> http://degyves.googlepages.com
> >>
> >>
> >
> >
> > --
> > I like: Like Like - The likeliest place on the web
> > <http://like-like.xenei.com>
> > LinkedIn: http://www.linkedin.com/in/claudewarren
>
>
>
>


-- 
I like: Like Like - The likeliest place on the web
<http://like-like.xenei.com>
LinkedIn: http://www.linkedin.com/in/claudewarren

Re: in-memory scale-out with Hazelcast Was: Horizontal scalability and limits of TDB

Reply via email to