Re: New High-Performance Marmotta Backend

Sergio Fernández Mon, 14 Dec 2015 05:54:49 -0800

Hi Sebastian,

On Mon, Dec 14, 2015 at 12:48 AM, Sebastian Schaffert <
[email protected]> wrote:
>
> > * How updated are the LevelDB Debian packages  ?
>
> Reasonably updated. LevelDB is quite mature, so even older packages should
> be fine. I used the stock Debian jessie packages during development.



Good.


> * Is there any way to get gRPC latest snapshots from any repo? Since it
> > looks that the latest release is not compatible, how long do you think
> > Google could take to release version 0.10.0?
>
> The Java side is available in the Sonatype snapshot repositories. I am not
> working on gRPC, so I don't know their release cycle. But I am guessing it
> will be frequent.


Ok, so let's wait. In the meantime I temporally added the repo for being
able to build the backend:
https://git-wip-us.apache.org/repos/asf?p=marmotta.git;h=abb1f00


> * I have some concerns about the build: is it possible to integrate the
> > cmake tasks in the Maven build?
>
> I wouldn't do this, because building C++ is much more system dependent than
> building Java. Even though the code in theory should compile on other
> platforms than Linux, I didn't try. And cmake does not provide any
> dependency resolution (just checking) for the packages the code depends on.
> I would rather build a Debian package (but we can only do this once gRPC
> 0.10.0 is released as Debian package as well, currently the latest version
> is 0.9.0 which contains a performance bug that causes problems for
> Marmotta).


Yes, that's one of the negative aspects of C++. So, yes, Debian package or
Docker container could options to make this easier to deploy.



> > * It definitely increases the deployment complexity, from the typical two
> > layers (db-backend) to three (db-server-backend).
>
> I think there's a misunderstanding here. :)
>
> LevelDB is an embedded C++ database, compiled into Ostrich as a library,
> not a separate database server (see http://leveldb.org/). Ostrich uses it
> internally to create 4 key-value databases, allowing fast lookups and range
> queries over SPOC, CSPO, OPSC, and PCOS keys. It stores all database files
> into a configurable directory on the local disk.
>
> marmotta_persistence is therefore a self-contained database server itself
> without any further dependencies. You startup marmotta_persistence, and you
> startup the Marmotta Java backend, and you are done. So the deployment
> complexity is the same as with Postgres, just you use marmotta_persistence
> instead of Postgres.  ;-)


Oh, sorry, I got it wrong. I though the server required a LeveDB daemon,
but if it embeds the database as library then the deployment architecture
is something similar.

So definitely, because this and the previous point, I think it's worth to
explore the container approach for Ostrich.

Just being more curious about the implementation details, if I understood
right, the triplestore model is quite similar to the proposed by LevelGraph
http://nodejsconfit.levelgraph.io/#16 isn't it?



> > Respect to that, how
> > mature/reliable/performance is the LevelDB port to Java
> > https://github.com/dain/leveldb to explore also that path?
>
> I wouldn't use it. It is a full reimplementation of LevelDB in Java and
> much slower and according to the website only trivially tested. You won't
> be able to come close to the performance of the C++ implementation in Java.


Yep, that's what I also read. Just to put the alternative on the table.


> * I fear maintaining another triple store would be just oo much for our
> > small community. We already spent quite some effort in the SPARQL-SQL.
> >
> > Well, at least it's worth to explore new paths and some scenarios may
> > benefit from this contribution.
>
> I'll maintain it for the time being to see if there are actually good use
> cases. The reasoning behind this is that it would require a radical step to
> go beyond the current performance of the SQL backend (hence LevelDB and
> hence C++). There are simply certain limits to relational databases and we
> have tuned the KiWi backend almost to that limit. I can see that there are
> use cases where users can live with limited transaction support and
> features and trade it for performance of most common operations. Wouldn't
> it be nice to run a SPARQL server over DBPedia on a Raspberry? ;-)


Well, the relational db limits and features set were well known when we
worked on improving KiWi. Personally I'm thinking more about the project
maintenance given the small size of our community, but probably I'm
too cautious.



> Since this is experimental and more work to maintain, I'd keep it as a
> separate Maven profile, though (like we do with BigData, Titan, etc).


Yes, I'd keep in a profile for sure.

Let see how traction gets to see further steps.

Superb work!

-- 
Sergio Fernández
Partner Technology Manager
Redlink GmbH
m: +43 6602747925
e: [email protected]
w: http://redlink.co

Re: New High-Performance Marmotta Backend

Reply via email to