On 09/06/15 16:23, [email protected] wrote:
Is there some "high level" overview of Lizard/Mantis/TDB2 yet extant? Like the
kind of thing we might see at a conference?
http://www.slideshare.net/andyseaborne/201411-apache-coneu-lizard
and the code on github (currently in my account).
In any event, thanks for working on this-- it's great to know that Jena will be
able to cluster soon.
I was recently looking at bulk loading. TDB2 loads at 65K triples per
second (3 indexes) but on the same machine, Lizard, running all server
nodes in the same JVM and still using Thrift/TCP networking to connect,
is loading at 115KTPS (no indexes), 100kTPS (2 indexes) and 95kTPS (3
indexes).
The difference is parallelism - Lizard loads the indexes in bulk units
with only the parser and node table on the main thread. The bulk
transfers and the service nodes are all separate threads. Some or all
of that approach applies to TDB2. Whether it is better to make TDB2 =
Lizard with in-JVM comms or still a separate project, I don't know.
Andy
All figures are approximate and indicative only (only a few runs).
They are all loading an empty database with 100 million BSBM, gzip
compressed, inside a write transaction.
The empty database is just for uniformity. TDB2 does not have a
separate bulkloader (1) not ported from TDB1 and (2) seeing if one is
needed.
Hardware: Quad core i7, 32G RAM, SSD. The BSBM data is streamed from
rotational disk, the raw parser speed is 315 kTPS.
---
A. Soroka
The University of Virginia Library
On Jun 8, 2015, at 1:24 PM, Andy Seaborne <[email protected]> wrote:
On 08/06/15 17:48, Marco Neumann wrote:
is TDB2 going to replace TDB or is TDB2 a new cluster product?
Whatever people (users, developers) want. Migrating Dbs is not as easy as
ungrading code. Running oaj.tdb and oaj.tdb2 side by side
(TDB2 is itself 7 maven modules ATM - some can be combined as they are small and just
"a good idea at the time").
TDB2 is not the cluster (that's Lizard). Mantis started as the separation out of the low
level code needed for Lizard. Initially validation of the reworking of transaction and
datastructures, a little extra work has made it as viable as "TDB2"
Andy
(oaj = org.apache.jena)
Marco
On Mon, Jun 8, 2015 at 11:41 AM, Andy Seaborne <[email protected]> wrote:
Informational announcement: TDB2
TDB2 is a reworking of TDB based on updated implementations of transactions
and transactional data structures for project Lizard (a clustered SPARQL
store).
TDB2 has:
* Arbitrary scale write-once transactions
* New transaction system - can add other first class components.
(e.g. text indexes, cache tables)
* Models works across transaction boundaries
* Cleaner, simpler, more maintainable
TDB2 databases are not compatible with TDB databases. It uses a more
efficient encoding for RDF terms. [1]
Being a database, the new indexing and transaction code needs time to settle
to bring the maturity up. I'm using that tech in Lizard development.
Andy
TDB2 code:
https://github.com/afs/mantis/tree/master/tdb2
Lizard slides:
http://www.slideshare.net/andyseaborne/201411-apache-coneu-lizard
[1] An upgrade path using TDB1-style encoding is possible; it is an one-way
upgrade path and not reversible [2]. TDB2 adds control files for the
copy-on-write data structures that TDB1 does not understand.
[2] Actually, if the encoding is compatible, what will happen is that TDB1
will see the database at the time of the upgrade. Welcome to copy-on-write
immutable data structures.