Re: New High-Performance Marmotta Backend

Sebastian Schaffert Sun, 13 Dec 2015 15:49:38 -0800

Hi Sergio,

2015-12-13 21:17 GMT+01:00 Sergio Fernández <[email protected]>:

> Hi Sebastian,
>
> great you finally had time to contribute this great stuff to Apache
> Marmotta! Definitely Ostrich is a much better than cmarmotta. I created
> MARMOTTA-621 to track the contribution and its integration in the source
> base.
>
> I still didn't have the chance to run it, but I have some questions:
>
> * How updated are the LevelDB Debian packages  ?
>

Reasonably updated. LevelDB is quite mature, so even older packages should
be fine. I used the stock Debian jessie packages during development.

>
> * Is there any way to get gRPC latest snapshots from any repo? Since it
> looks that the latest release is not compatible, how long do you think
> Google could take to release version 0.10.0?
>

The Java side is available in the Sonatype snapshot repositories. I am not
working on gRPC, so I don't know their release cycle. But I am guessing it
will be frequent.

> * I have some concerns about the build: is is possible to integrate the
> cmake tasks in the Maven build?
>

I wouldn't do this, because building C++ is much more system dependent than
building Java. Even though the code in theory should compile on other
platforms than Linux, I didn't try. And cmake does not provide any
dependency resolution (just checking) for the packages the code depends on.
I would rather build a Debian package (but we can only do this once gRPC
0.10.0 is released as Debian package as well, currently the latest version
is 0.9.0 which contains a performance bug that causes problems for
Marmotta).

>
> * It definitely increases the deployment complexity, from the typical two
> layers (db-backend) to three (db-server-backend).

I think there's a misunderstanding here. :)

LevelDB is an embedded C++ database, compiled into Ostrich as a library,
not a separate database server (see http://leveldb.org/). Ostrich uses it
internally to create 4 key-value databases, allowing fast lookups and range
queries over SPOC, CSPO, OPSC, and PCOS keys. It stores all database files
into a configurable directory on the local disk.

marmotta_persistence is therefore a self-contained database server itself
without any further dependencies. You startup marmotta_persistence, and you
startup the Marmotta Java backend, and you are done. So the deployment
complexity is the same as with Postgres, just you use marmotta_persistence
instead of Postgres.  ;-)

Have a look at
https://github.com/apache/marmotta/blob/develop/libraries/ostrich/backend/persistence/leveldb_persistence.h
and
https://github.com/apache/marmotta/blob/develop/libraries/ostrich/backend/persistence/leveldb_persistence.cc
to see the details.

> Respect to that, how
> mature/reliable/performance is the LevelDB port to Java
> https://github.com/dain/leveldb to explore also that path?
>

I wouldn't use it. It is a full reimplementation of LevelDB in Java and
much slower and according to the website only trivially tested. You won't
be able to come close to the performance of the C++ implementation in Java.

>
> * I fear maintaining another triple store would be just oo much for our
> small community. We already spent quite some effort in the SPARQL-SQL.
>
> Well, at least it's worth to explore new paths and some scenarios may
> benefit from this contribution.
>

I'll maintain it for the time being to see if there are actually good use
cases. The reasoning behind this is that it would require a radical step to
go beyond the current performance of the SQL backend (hence LevelDB and
hence C++). There are simply certain limits to relational databases and we
have tuned the KiWi backend almost to that limit. I can see that there are
use cases where users can live with limited transaction support and
features and trade it for performance of most common operations. Wouldn't
it be nice to run a SPARQL server over DBPedia on a Raspberry? ;-)

Since this is experimental and more work to maintain, I'd keep it as a
separate Maven profile, though (like we do with BigData, Titan, etc).

> Cheers,
>
>
>
> On Sat, Dec 12, 2015 at 6:40 PM, Sebastian Schaffert <
> [email protected]>
> wrote:
>
> > Update: SPARQL tuple (SELECT) queries are now supported natively
> > (experimental). All other kinds of SPARQL queries are still using the
> > Sesame in-memory evaluation.
> >
> > 2015-12-12 17:22 GMT+01:00 Sebastian Schaffert <[email protected]>:
> >
> > > Hi all,
> > >
> > > I've been working on it for a long time outside the main Marmotta tree,
> > > but even though it is still experimental, it is now mature enough to be
> > > included in the development repository of Marmotta: a new triple store
> > > backend implemented in C++ and using LevelDB (http://www.leveldb.org).
> > In
> > > analogy to KiWi, I named it Ostrich - another bird without wings, but
> one
> > > that runs very fast :)
> > >
> > > The Ostrich backend is ultra fast compared to Kiwi (I can import 500k
> > > triples in 7 seconds), but it does not provide the same feature set. In
> > > particular, the following restrictions apply:
> > > - limited transaction support; a transaction is active while only
> > > executing updates, but as soon as you run a query on a connection it
> will
> > > auto-commit
> > > - currently emulated in-memory SPARQL (I started working on direct C++
> > > SPARQL support, but this is not yet available in Java - but performance
> > is
> > > promising, so more to come :) )
> > > - currently emulated LDPath support (I might implement LDPath in C++ if
> > > the emulated performance is not good enough)
> > > - currently no reasoner (it's certainly possible, but a lot of work)
> > > - currently no versioning or snapshotting (might be possible at LevelDB
> > > level, but didn't investigate much)
> > >
> > > The new backend consists of a C++ part (server) and a Java part
> (client).
> > > Client and server communicate with each other using Proto messages and
> > gRPC
> > > (latest snapshot version!). The data model and service definitions are
> in
> > > the .proto files found in libraries/ostrich/backend.
> > >
> > > If you want to try this out, please have a look at the README.md file
> > > located in libraries/ostrich/backend. Besides compiling the C++ code
> > > separately with cmake and make, Marmotta needs to be compiled with
> > >
> > > mvn clean install -Postrich -DskipTests
> > >
> > > note that the Java code contains tests, but these require a running
> > > backend. So it is for now better to just skip tests when building the
> > Java
> > > part completely.
> > >
> > > Bulk imports are best done with the C++ command line client (see
> > > README.md).
> > >
> > > Have fun!
> > >
> > > Sebastian
> > >
> > >
> > >
> > > P.S. You can now also use the client to try out native SPARQL support:
> > >
> > > ./client/marmotta_client sparql 'select * where { ?s ?p ?o } limit 10'
> > >
> > > The result will be a mostly unreadable text formatted dump of the
> > > resulting proto messages :)
> > >
> > >
> > >
> > >
> >
>
>
>
> --
> Sergio Fernández
> Partner Technology Manager
> Redlink GmbH
> m: +43 6602747925
> e: [email protected]
> w: http://redlink.co
>

Re: New High-Performance Marmotta Backend

Reply via email to