> (Apparently Hadoop, the main reason we shade Guava does now in fact work with > later Guava's than its dependency on an old Guava version.)
Andy, Rob— in this light, do you think it’s worth trying an experiment to unshade Guava? From what I understood of a previous conversation about this question, the only real way to be sure that Jena and Hadoop are playing well is to actually make some deployments and try out different behaviors; there’s no way to write a simple integration test… --- A. Soroka The University of Virginia Library > On Oct 19, 2015, at 6:49 AM, Andy Seaborne <[email protected]> wrote: > > These comments focus on the architecture of the proposal. > > Capturing output > > ResponseResultSet/ResultSetFormatter. > > See also the discussion point below for a different approach to caching that > is not based on capturing the written output. > > The current design requires changes to ResultSetFormatter and every output > format. > > An alternative create a replicating OutputStream which can output to two > places, one of which can be the string capture. This localises changes to > Fuseki only. > > Sketch (there may be better ways to achieve the same effect). > > class OutputStream2 extends OutputStream > public OutputStream2(OutputStream out1, OutputStream out2) > public void write(byte b[]) throws IOException { > if ( out1 != null ) out1.write(b) ; > if ( out2 != null ) out2.write(b) ; > } > ... > > ResponseResultSet.OutputContent can use OutputStream (it does not need > ServletOutputStream). > > OutputStream outServlet = action.response.getOutputStream() ; > OutputStream out ; > if { writing to cache ) { > ByteArrayOutputStream outCatcher = new ByteArrayOutputStream() ; > out = new OutputStream2(outServlet, outCatcher) ; > } else { > out = outServlet ; > } > ... > proc.output(out) ; > > This will work if a new format is added (a Thrift based binary format for > example) without needing the format to be aware of cache entry creation. > > It also means the caching is not exposed in the ARQ API. > > Caching and content negotiation > > The Cache key is insensitive to the "Accept" header. > > The format of the output is determined by the "Accept" header. The query > string output= is merely a non-standard way to achieve the same thing when it > is hard to set the HTTP header (some lightweight scripting libraries). > > The current design writes the same format as the request, and uses the cache > but these two operations are different: > > GET /datasets/query=SELECT * { ?s ?p ?o} > Accept: application/sparql-results+xml > > GET /datasets/query=SELECT * { ?s ?p ?o} > Accept: application/sparql-results+json > > Cache ResultSet > > (Discussion point) A possibility is that the cache is of a copy of the > ResultSet (as java objects). > > Advantages: > > • cached item is not in a particular format. Content negotiation > happens per request. > • OFFSET/LIMIT can be applied to the cached results f the original > query is executed without OFFSET/LIMIT (weak version of paging). > See the experimental, sketch-only and out of date sparql-cache for > OFFSET/LIMIT processing. > > Disadvantages: > > • Does not stream > • cache entries are rewritten each time they are used. > An iterator over the result set that captures the output while iterating > would address the non-streaming disadvantage. > > Content negotiation happens per request. > > Cache invalidation > > Update operations must invalid the cache. A simple way is to simply > invalidate the whole cache. It is very hard to determine whether an update > affects a cache entry selectively. > > Configuration and Control > > The cache is hard wired and always on. It may not always be the right choice. > There needs to be a way to control it, possibly on a per-dataset basis. Note > there is only one SPARQL_Query instance per Fuseki server due to the way > dispatch is dynamic. > > Suggestion 1: A servlet SPARQL_Query_Cache catches passes requests to a > separate SPARQL_Query. This is the inheritance way to separate cache code > from the rest of processing. It works better if OFFSET/LIMIT control is going > to be added later. > > Suggestion 2: It is primarily SPARQL_Query::sendResults being caught here so > a "result handler" set in SPARQL_Query would allow a separation of cache code > into it's own class and just a hook in SPARQL_Query. This is the composition > way to separate the cache code from the rest of processing. > > Reuse the Guava already in Jena > > Extend CacheGuava to have a constructor that takes a > org.apache.jena.ext.com.google.common.cache.CacheBuilder. > > Background: > Jena includes a shared Guava 18 in org.apache.jena.ext.com.google.common in > artifact jena-shaded-guava. (Apparently Hadoop, the main reason we shade > Guava does now in fact work with later Guava's than its dependency on an old > Guava version.) > > Extend to CONSTRUCT and DESCRIBE (future) > > It is only for SELECT and ASK queries. See ResponseModel and ResponseDataset > for CONSTRUCT and DESCRIBE. The output capture point shoudl make this > possible. > > Documentation and tests > > Documentation and tests needed. > > — > Reply to this email directly or view it on GitHub. >
