On 05/26/2013 12:58 AM, Vishesh Handa wrote:
Hey guys
I have made a very important discovery - The Storage service is a big
bottleneck!
Running a query such as - 'select * where { graph ?g { ?r ?p ?o. } }
LIMIT 50000' by directly connecting to virtuoso via ODBC takes about
2.65 seconds. In contrast running the same query by using the Nepomuk
ResourceManager's main model takes about 19.5 seconds.
Nepomuk internally uses the Soprano::LocalSocketClient to connect to the
storage service which runs a Soprano::LocalServer.
I've been trying to optimize this Soprano code for some time now and
from 4.9 we have a good 200% performance increase. But we can increase
it a LOT more by just directly communicating with virtuoso.
Pros -
* 6-8x performance upgrade
* The storage service isn't using such high cpu when reading
* Accurate reporting - Suppose app 'x' does a costly query which
requires a large number of results, then 'x' will have high cpu
consumption. Currently both NepomukStorage and 'x' have very high cpu
consumption.
Cons -
* Less Control - By having all queries go through the Nepomuk Storage we
could theoretical build amazing tools to tell us which query is
executing and how long it is taking. However, no such tool has ever been
written - so we won't be loosing anything.
Before 4.10 this could never have been done because we used to have a
lot of code in the storage service which handled removable media and
other devices. This code would often modify the sparql queries and
modify the results. With 4.10, I threw away all that code.
Comments?
PS: This is only for read only operations. All writes should still go
through the storage service. Though maybe we want to change that as well?
My 2 cents:
You could even do this for write operations but then you would need
clients to always use a client library which does all the checks and
notifications. I suppose this is fine but of course requires to for
example write a python lib. Alternatively you could support both: direct
ODBC writes via C++, slower writes via the server (internally using the
C++ client lib) for everyone else (for example scripts).
All in all it seems like a good idea. I always liked the modular system
with the storage service, but let's face it: it's a performance drain
and in the end does not give us much besides a nice design.
Cheers,
Sebastian
_______________________________________________
Nepomuk mailing list
[email protected]
https://mail.kde.org/mailman/listinfo/nepomuk