I'm certainly intrigued by Spark, I just don't know much about it. Hazelcast is 
attractive because it lets me think in comfortable abstractions like the Java 
Collections Framework. But maybe that is limiting... time to take a look at:

https://spark.apache.org/docs/latest/graphx-programming-guide.html

ajs6f

> On Jan 23, 2017, at 1:08 PM, Andy Seaborne <[email protected]> wrote:
> 
> Hazelcast offers scan+filter (liek the original MySQL!) which can be a great 
> help, pushing filters over to the storage.  Combined with sort, then merge 
> joins in the client (bonus: with skipping sections of index) would be 
> interesting.
> 
> If going this route, what about Apache Spark?  More powerful operators.
> 
>    Andy
> 
> On 22/01/17 16:54, A. Soroka wrote:
>> Another idea with which I have been playing is to try to scale horizontally 
>> but only in-memory:
>> 
>> I could take the one-writeble-graph-per-transaction dataset code I've 
>> written and replace the ConcurrentHashMap that currently holds the graphs 
>> with a Hazelcast [1] distributed map. Naive union graph performance would be 
>> awful, but if the workload was chiefly addressing individual graphs and the 
>> graphs were large enough, the parallelism might be really worthwhile.
>> 
>> Hazelcast offers per-entry locks [2], so those could be used instead of the 
>> lockable graphs I'm using now. It also also offers optimistic locking via 
>> Map.replace( key, oldValue, newValue ), so I could even imagine offering a 
>> switch between "strict mode" in which locks are used and "read-heavy mode" 
>> in which it is assumed that the application will prevent contention on 
>> individual graphs but that an update could fail if that isn't so.
>> 
>> Hazelcast also offers some support for remote computation at the entries of 
>> its distributed maps [3], so it might be possible to distribute 
>> findInSpecificNamedGraph() executions (maybe eventually some of the ARQ 
>> execution as well?). It also supports a kind of query language [4] that 
>> might be used to obtain more efficiency, perhaps by using Bloom filters for 
>> graphs, as Claude has discussed before.
>> 
>> All just food for thought, for now.
>> 
>> ---
>> A. Soroka
>> 
>> [1] https://hazelcast.org/
>> [2] 
>> http://docs.hazelcast.org/docs/3.7/manual/html-single/index.html#locking-maps
>> [3] 
>> http://docs.hazelcast.org/docs/3.7/manual/html-single/index.html#entry-processor
>> [4] 
>> http://docs.hazelcast.org/docs/3.7/manual/html-single/index.html#distributed-query
>> 
>>> On Jan 20, 2017, at 8:38 PM, De Gyves <[email protected]> wrote:
>>> 
>>> I'd like to participate on the storage portion of Jena, maybe TDB. As I
>>> have worked many years developing with RBDMS I like to explore new
>>> horizonts of persistence and graph based ones seem very promising to my
>>> next projects, so i'd like to use SPARQL and RDF with Jena/TDB and see how
>>> far I can go.
>>> 
>>> So I've spent the last two days exploring subjects of the mail archives
>>> from august 2015 to january of this year the of jena-dev and found some
>>> interesting threads, as the development of TDB2, the tests of 100m of BSBM
>>> data, a question of horizontal scaling, and that anything that implements
>>> DatasetGraph can be used for a triples store. Some readings of jena doc
>>> include: SPARQL, The RDF API, Txn and TDB transactions.
>>> 
>>> What I am looking for is to get a clear perspective of some requirements
>>> which are taken for granted on a traditional RDBMS. These are:
>>> 
>>> 1. Atomicity, consistency, isolation and durability of a transaction on a
>>> single tdb database: Apart from the limitations on the documentation of TDB
>>> Transactions and Txn,  there are current issues? edge cases detected and
>>> not yet covered?
>>> 2. Are there currently available strategies to achieve a horizontal-scaled
>>> tdb database?
>>> 3. What do you think of try to implement a horizontal scalability with
>>> DatasetGraph or something else with, let's say, cockroachdb, voltdb,
>>> postgresql, etc?
>>> 4. If there are some stress tests available, e.g. I read about a 100M of
>>> BSBM test, is it included in the src? or may I have a copy of it? I'd like
>>> to see what the limits are of the current TDB, and maybe of TDB2: maximum
>>> size on disk of a dataset, max number of nodes on a dataset, of models or
>>> graphs on a dataset, the limiting behavior of a typical read/write
>>> transaction vs. the number of nodes, datasets, etcetera. Or, some
>>> guidelines, so I can start to create this stress code. Will it be useful to
>>> you also?
>>> 
>>> --
>>> Víctor-Polo de Gyvés Montero.
>>> +52 (55) 4926 9478 (Cellphone in Mexico city)
>>> Address: Daniel Delgadillo 7 6A, Agricultura neighborhood, Miguel Hidalgo
>>> burough
>>> ZIP: 11360, México City.
>>> 
>>> http://degyves.googlepages.com
>> 

Reply via email to