Re: Marmotta and Apache Ignite

Sebastian Schaffert Wed, 03 Feb 2016 04:53:36 -0800

Hi Essam,

[email protected]


Query performance mostly depends on the complexity of the query and the
amount of data you are querying.

The KiWI backend is translating a SPARQL query into SQL with all advantages
and disadvantages (SQL offers a rich expressiveness and supports features
like sorting and grouping). It is meant for medium amounts of data and more
complex queries. Please see http://marmotta.apache.org/kiwi/sparql.html for
information, in particular the section "performance considerations".

The new Ostrich backend is designed for large amounts of data, but mostly
simple queries. It won't offer you performance benefits if you are using
e.g. ORDER BY or GROUP BY on large sets of candidates (the complexity is in
the nature of the problem and not easily addressed). It will give you
blazing speed when you have simple queries (i.e. simple join style queries)
and strict limits.

Ignite sounds interesting and promises very good performance. I still have
doubts about it being much faster for evaluating complex SPARQL queries
though. It's worth a try implementing it as a backend, but don't expect any
wonders.

Ostrich is using highly efficient indexing and key range queries that won't
benefit much from an in-memory store, especially not if it is distributed
and therefore incurs network overhead. The fact Ostrich works so fast for
certain queries is because LevelDB isn't a pure key/value store, it's a
key/value store allowing to query for partial keys and doing key range
queries. As far as I can see, Ignite is a key/value store only, and I
imagine it is hard to implement range queries over it.

The best way to improve SPARQL performance is probably to use Ostrich and
work on implementing/improving its SPARQL query planner, which at the
moment is very simple. One example would be to reorder patterns in the
WHERE part so the most selective pattern is applied first. This would
require tracking triple statistics. Another example would be to drop
DISTINCT for cases where it is not needed, or to use the index for ORDER
BY. I'll see what I can do in this area in the next weeks, but no promises.
:)

Cheers,

Sebastian



2016-02-03 6:02 GMT+01:00 Elsherif, Essam (ELS-NYC) <[email protected]
>:

> Hi Sebastian,
>
> Currently I am working on a Semantic Chemical Structure Search Engine
> where I am trying to use Marmotta as Triple Store and linked data platform.
> The SPARQL search queries run very slow. I was reading on the forum that
> you added levelDB integration which runs much faster. I am going to test
> that soon. However, I have been exploring for some time Apache Ignite
> <https://ignite.apache.org/>, which is essentially a distributed
> in-memory data fabric. I wonder if you would like to collaborate on using
> Apache Ignite as backend store for Marmotta, just like H2, Kiwi, and
> Ostrich. I think this we would give the best performance gain. Please let
> me know, and we can have a discussion about it.
>
>
>
> Thanks,
>
> *Essam Elsherif*
>
> Solutions Architect – Elsevier Life Science Solutions
>
> 240 W. 37th Street, 2nd Floor
>
> New York, NY  10018
>
> [email protected]      ￨ email
>
> +1 646 380 3759                     ￨ office
> +1 646 873 0421                     ￨ mobile
>
>
>

Re: Marmotta and Apache Ignite

Reply via email to