On 03/02/15 11:13, Saikat Maitra wrote:
Hello Andy,
I have build a prototype version for the SPARQL Query Cache implementation.
I have provided the implementation details below:
1. Created CacheStore class with utility operation as doGet, doSet and
doUnset cache.
2. Created CacheClient interface and implemented client for local in memory
cache.
3. Created CacheAction class with enum fields as READ_CACHE and WRITE_CACHE.
4. Created Cache class as a wrapper object to hold the Cache result(Json
result as of now) and SPARQLResultSet
5. I check in SPARQLQuery that if cache is null or is it initialised.
6. If it is null I set WRITE_CACHE action and pass cache object to
ResponseResultSet.
7. ResponseResultSet creates a StringBuilder object and pass it to
IndentedWriter.
8. As Query results are iterated and written in ServletOut Stream I also
append the data in StringBuilder object.
9. Before flushing the data to Outstream I store the StringBuilder object
which contain the json result in cache and set data in cache object has
been initialised.
10. If CacheStore already contain the data then I retrieve the data from
cache and write it to ServletOutStream and flush the data.
Here are the code details in my jena fork.
https://github.com/samaitra/jena/tree/master/jena-fuseki2/jena-fuseki-core/src/main/java/org/apache/jena/fuseki/cache
https://github.com/samaitra/jena/blob/master/jena-fuseki2/jena-fuseki-core/src/main/java/org/apache/jena/fuseki/servlets/ResponseResultSet.java
https://github.com/samaitra/jena/blob/master/jena-fuseki2/jena-fuseki-core/src/main/java/org/apache/jena/fuseki/servlets/SPARQL_Query.java
Please let me know your feedback.
Currently I have tested the implementation with Ask and Select Queries. I
still need to test it for Construct and Describe Queries. I will also need
to make modification for returning thrift response.
Regards
Saikat
Hi Saikat,
A few comments and questions from looking at the code.
1/ At the moment the cache is deeply integrated into SPARQL_Query. Would
it be practical to have it as a separate service endpoint that invokes
SPARQL_Query? There could be different caches for each service, and
resources allocated appropriately.
2/ I found calling the individual cache entry a "Cache" object a bit
confusing. "Cache" is usually (to me) the whole thing. Would
"CacheEntry" be a better name?
3/ CacheBuilder - is the idea where to cache the HTTP response as a
string, or rather the last response as the right content type? The most
benefit comes from not executing the query - I'm not sure that worrying
about converting the results to send is the big win.
4/ The thing cached is a "SPARQLResult" object from executeQuery.
Unfortunately, in the case of SELECT queries, ResultSets from query are
"use once" (they are iterators) so isn't the second time they are used
always going to be empty results? I think the code needs to take a copy
of the results and I didn't see any copy being taken. Did I miss it?
If a copy is needed, and to support disk usage, using the Thrift format
(when not an in-memory cache) should be fastest.
5/ How do cache entries get invalidated when an update happens?
6/ CacheAction - is this "work in progress" in some way. I wasn't able
to see what it was for.
Andy