On 09/06/15 16:23, [email protected] wrote:
Is there some "high level" overview of Lizard/Mantis/TDB2 yet extant? Like the 
kind of thing we might see at a conference?


http://www.slideshare.net/andyseaborne/201411-apache-coneu-lizard

and the code on github (currently in my account).

In any event, thanks for working on this-- it's great to know that Jena will be 
able to cluster soon.

I was recently looking at bulk loading. TDB2 loads at 65K triples per second (3 indexes) but on the same machine, Lizard, running all server nodes in the same JVM and still using Thrift/TCP networking to connect, is loading at 115KTPS (no indexes), 100kTPS (2 indexes) and 95kTPS (3 indexes).

The difference is parallelism - Lizard loads the indexes in bulk units with only the parser and node table on the main thread. The bulk transfers and the service nodes are all separate threads. Some or all of that approach applies to TDB2. Whether it is better to make TDB2 = Lizard with in-JVM comms or still a separate project, I don't know.

        Andy

All figures are approximate and indicative only (only a few runs).
They are all loading an empty database with 100 million BSBM, gzip compressed, inside a write transaction.

The empty database is just for uniformity. TDB2 does not have a separate bulkloader (1) not ported from TDB1 and (2) seeing if one is needed.

Hardware: Quad core i7, 32G RAM, SSD. The BSBM data is streamed from rotational disk, the raw parser speed is 315 kTPS.


---
A. Soroka
The University of Virginia Library

On Jun 8, 2015, at 1:24 PM, Andy Seaborne <[email protected]> wrote:

On 08/06/15 17:48, Marco Neumann wrote:
is TDB2 going to replace TDB or is TDB2 a new cluster product?

Whatever people (users, developers) want.  Migrating Dbs is not as easy as 
ungrading code.  Running oaj.tdb and oaj.tdb2 side by side

(TDB2 is itself 7 maven modules ATM - some can be combined as they are small and just 
"a good idea at the time").

TDB2 is not the cluster (that's Lizard).  Mantis started as the separation out of the low 
level code needed for Lizard. Initially validation of the reworking of transaction and 
datastructures, a little extra work has made it as viable as "TDB2"

        Andy

(oaj = org.apache.jena)


Marco

On Mon, Jun 8, 2015 at 11:41 AM, Andy Seaborne <[email protected]> wrote:
Informational announcement: TDB2

TDB2 is a reworking of TDB based on updated implementations of transactions
and transactional data structures for project Lizard (a clustered SPARQL
store).

TDB2 has:

* Arbitrary scale write-once transactions
* New transaction system - can add other first class components.
   (e.g. text indexes, cache tables)
* Models works across transaction boundaries
* Cleaner, simpler, more maintainable

TDB2 databases are not compatible with TDB databases.  It uses a more
efficient encoding for RDF terms.  [1]

Being a database, the new indexing and transaction code needs time to settle
to bring the maturity up.  I'm using that tech in Lizard development.

         Andy

TDB2 code:
https://github.com/afs/mantis/tree/master/tdb2

Lizard slides:
http://www.slideshare.net/andyseaborne/201411-apache-coneu-lizard


[1] An upgrade path using TDB1-style encoding is possible; it is an one-way
upgrade path and not reversible [2].  TDB2 adds control files for the
copy-on-write data structures that TDB1 does not understand.

[2] Actually, if the encoding is compatible, what will happen is that TDB1
will see the database at the time of the upgrade.  Welcome to copy-on-write
immutable data structures.






Reply via email to