Hi Essam, thanks for your interest in the project. Before jumping in other discussions, I'd like to ask you about the performance issues you currently have with Marmotta: What version are you using? What database are you using with KiWi? Because, as Sebastian said, Marmotta 3.3.0 and 3.4.0-SNAPSHOT contain so many improvements to work over PostgreSQL that you will benefit if you are still using H2 (which honestly was never meant for nothing more complex than demos).
Being said that, I have to say we are a very small project to maintain so large code base. So we have to explore those paths that we could both benefit and maintain in the long-term. Don't get me wrong, Ostrich is a step forward in load time, but it'd still need so much work to be done to improve in query time. SPARQL is far too expressive to be computed in reasonable time, specially when using some clauses like ORDER BY or DISTINCT. >From what I read about Apache Ignite, it can automatically integrate with external databases via JCache. So I think it could be worth to first try wrapping KiWi with Ignite. What do you think? Maybe it could be an idea for GSoC2016... Cheers, On Wed, Feb 3, 2016 at 1:53 PM, Sebastian Schaffert <[email protected]> wrote: > Hi Essam, > > [email protected] > > Query performance mostly depends on the complexity of the query and the > amount of data you are querying. > > The KiWI backend is translating a SPARQL query into SQL with all advantages > and disadvantages (SQL offers a rich expressiveness and supports features > like sorting and grouping). It is meant for medium amounts of data and more > complex queries. Please see http://marmotta.apache.org/kiwi/sparql.html > for > information, in particular the section "performance considerations". > > The new Ostrich backend is designed for large amounts of data, but mostly > simple queries. It won't offer you performance benefits if you are using > e.g. ORDER BY or GROUP BY on large sets of candidates (the complexity is in > the nature of the problem and not easily addressed). It will give you > blazing speed when you have simple queries (i.e. simple join style queries) > and strict limits. > > Ignite sounds interesting and promises very good performance. I still have > doubts about it being much faster for evaluating complex SPARQL queries > though. It's worth a try implementing it as a backend, but don't expect any > wonders. > > Ostrich is using highly efficient indexing and key range queries that won't > benefit much from an in-memory store, especially not if it is distributed > and therefore incurs network overhead. The fact Ostrich works so fast for > certain queries is because LevelDB isn't a pure key/value store, it's a > key/value store allowing to query for partial keys and doing key range > queries. As far as I can see, Ignite is a key/value store only, and I > imagine it is hard to implement range queries over it. > > The best way to improve SPARQL performance is probably to use Ostrich and > work on implementing/improving its SPARQL query planner, which at the > moment is very simple. One example would be to reorder patterns in the > WHERE part so the most selective pattern is applied first. This would > require tracking triple statistics. Another example would be to drop > DISTINCT for cases where it is not needed, or to use the index for ORDER > BY. I'll see what I can do in this area in the next weeks, but no promises. > :) > > Cheers, > > Sebastian > > > > 2016-02-03 6:02 GMT+01:00 Elsherif, Essam (ELS-NYC) < > [email protected] > >: > > > Hi Sebastian, > > > > Currently I am working on a Semantic Chemical Structure Search Engine > > where I am trying to use Marmotta as Triple Store and linked data > platform. > > The SPARQL search queries run very slow. I was reading on the forum that > > you added levelDB integration which runs much faster. I am going to test > > that soon. However, I have been exploring for some time Apache Ignite > > <https://ignite.apache.org/>, which is essentially a distributed > > in-memory data fabric. I wonder if you would like to collaborate on using > > Apache Ignite as backend store for Marmotta, just like H2, Kiwi, and > > Ostrich. I think this we would give the best performance gain. Please let > > me know, and we can have a discussion about it. > > > > > > > > Thanks, > > > > *Essam Elsherif* > > > > Solutions Architect – Elsevier Life Science Solutions > > > > 240 W. 37th Street, 2nd Floor > > > > New York, NY 10018 > > > > [email protected] │ email > > > > +1 646 380 3759 │ office > > +1 646 873 0421 │ mobile > > > > > > > -- Sergio Fernández Partner Technology Manager Redlink GmbH m: +43 6602747925 e: [email protected] w: http://redlink.co
