Hi Andy, Thank you so much for your feedback. Please find my replies inline.
On Tue, Feb 10, 2015 at 8:15 PM, Andy Seaborne <a...@apache.org> wrote: > On 03/02/15 11:13, Saikat Maitra wrote: > >> Hello Andy, >> >> I have build a prototype version for the SPARQL Query Cache >> implementation. >> >> I have provided the implementation details below: >> >> 1. Created CacheStore class with utility operation as doGet, doSet and >> doUnset cache. >> 2. Created CacheClient interface and implemented client for local in >> memory >> cache. >> 3. Created CacheAction class with enum fields as READ_CACHE and >> WRITE_CACHE. >> 4. Created Cache class as a wrapper object to hold the Cache result(Json >> result as of now) and SPARQLResultSet >> 5. I check in SPARQLQuery that if cache is null or is it initialised. >> 6. If it is null I set WRITE_CACHE action and pass cache object to >> ResponseResultSet. >> 7. ResponseResultSet creates a StringBuilder object and pass it to >> IndentedWriter. >> 8. As Query results are iterated and written in ServletOut Stream I also >> append the data in StringBuilder object. >> 9. Before flushing the data to Outstream I store the StringBuilder object >> which contain the json result in cache and set data in cache object has >> been initialised. >> 10. If CacheStore already contain the data then I retrieve the data from >> cache and write it to ServletOutStream and flush the data. >> >> Here are the code details in my jena fork. >> >> https://github.com/samaitra/jena/tree/master/jena-fuseki2/ >> jena-fuseki-core/src/main/java/org/apache/jena/fuseki/cache >> >> https://github.com/samaitra/jena/blob/master/jena-fuseki2/ >> jena-fuseki-core/src/main/java/org/apache/jena/fuseki/ >> servlets/ResponseResultSet.java >> >> https://github.com/samaitra/jena/blob/master/jena-fuseki2/ >> jena-fuseki-core/src/main/java/org/apache/jena/fuseki/ >> servlets/SPARQL_Query.java >> >> >> Please let me know your feedback. >> >> Currently I have tested the implementation with Ask and Select Queries. I >> still need to test it for Construct and Describe Queries. I will also need >> to make modification for returning thrift response. >> >> Regards >> >> Saikat >> >> > Hi Saikat, > > A few comments and questions from looking at the code. > > 1/ At the moment the cache is deeply integrated into SPARQL_Query. Would > it be practical to have it as a separate service endpoint that invokes > SPARQL_Query? There could be different caches for each service, and > resources allocated appropriately. > Yes, I am evaluating options to implement it as a separate service endpoints with servlet filters applied for queries to read and update cache entries. > > 2/ I found calling the individual cache entry a "Cache" object a bit > confusing. "Cache" is usually (to me) the whole thing. Would "CacheEntry" > be a better name? > Yes, I will refactor the Cache class to CacheEntry. > > 3/ CacheBuilder - is the idea where to cache the HTTP response as a > string, or rather the last response as the right content type? The most > benefit comes from not executing the query - I'm not sure that worrying > about converting the results to send is the big win. > The CacheBuilder is to store the cache results and is a member variable of Cache (CacheEntry) class. I started with storing results in string to test out json output but I will update it to store results of any format. > > 4/ The thing cached is a "SPARQLResult" object from executeQuery. > Unfortunately, in the case of SELECT queries, ResultSets from query are > "use once" (they are iterators) so isn't the second time they are used > always going to be empty results? I think the code needs to take a copy of > the results and I didn't see any copy being taken. Did I miss it? > The Cache (CacheEntry) class stores SPARQLResult, CacheBuilder and a flag which suggests if the Cache (CacheEntry) is initialised or not. Incase of Select queries I pass CacheBuilder along with ServletOutputStream and ResultSet to IndentedWriter which while iterating over the results keeps appending the results to CacheBuilder. At last before out.flush() I am setting the flag in Cache (CacheEntry) to be initialised. https://github.com/samaitra/jena/blob/master/jena-arq/src/main/java/org/apache/jena/atlas/io/IndentedWriter.java https://github.com/samaitra/jena/blob/master/jena-fuseki2/jena-fuseki-core/src/main/java/org/apache/jena/fuseki/servlets/ResponseResultSet.java > > If a copy is needed, and to support disk usage, using the Thrift format > (when not an in-memory cache) should be fastest. > > 5/ How do cache entries get invalidated when an update happens? > As of now the Cache (CacheEntry) are set to expire at 5 mins and no cache refresh policies are applied. I am planning to have a scheduled maintenance thread running clean up operation periodically. > > 6/ CacheAction - is this "work in progress" in some way. I wasn't able to > see what it was for. CacheAction has enum fields that helps to decide whether to read the Cache (CacheEntry) or to write (populate) the Cache (CacheEntry). It is used in ResponseResultSet. https://github.com/samaitra/jena/blob/master/jena-fuseki2/jena-fuseki-core/src/main/java/org/apache/jena/fuseki/servlets/ResponseResultSet.java > > > Andy > Regards Saikat