If there were an ETag per dataset and a method on the dataset to force an ETag reset would this address the index issue in that the indexer could reset the ETag when it deemed appropriate?
In any case I would go with the first choice. Is there anything that prohibits sending both an ETag and a constant expires? I havn't looked but I recall they are not mutually exclusive. Claude On Mon, Jun 29, 2015 at 2:04 PM, A. Soroka <[email protected]> wrote: > A quick discussion of ETags in the "backup admin" PR that was sent by Yang > Yuanzhe led me to this issue: > > https://issues.apache.org/jira/browse/JENA-388 > > for "Make Fuseki responses cacheable" and which has been around for a > little while. I was wondering about a couple of potential approaches here > and thought I would run them down: > > 1) ETag-per-Dataset: this is a single ETag value for any Dataset for all > requests, updated whenever a mutating request completes. This would work by > letting any change on a Dataset whatsoever that comes through Fuseki > invalidate all ETag-based caching on that Dataset. This seems to be where > Andy Seaborne and Rob Vesse were heading, but I obviously can't speak for > them. Advantage: relatively simple. Disadvantages: changes in the indexes > not performed by Fuseki will not be reflected properly, only useful for > instances that receive the right patterns of changes (meaning for which > mutations aren't too "evenly sprinkled" amongst queries, thus keeping the > cache often invalidated). > > 2) Constant Expires: Rob Vesse discusses this a bit in the issue. It's an > Expires header that is configurable to allow some admin adjustment, but is > constant during runtime. Advantage: dead simple. Disadvantage: unless the > usage scenario is very tightly controlled, there's going to be some leakage > of stale data. That may or may not be a big problem for an integrator, > depending on use case. It would have to be carefully documented, I think, > to avoid nasty surprises. > > 3) Per-query ETag: This would be mean some kind of map from request to > ETag from which ETag headers are supplied for every request. The problem > with this is that it implies some kind of reasonable algorithm for > determining when an arbitrary update makes sufficient changes in an > arbitrary graph to affect another arbitrary query, or it would imply > stretching the meaning of "weak" ETag to a point that is probably not > useful or correct for a query endpoint. This doesn't seem very practical. > > 4) Per-query-for-some-queries ETag. The idea here would be to cut down > option 3 to a tranche of queries for which there actually _does_ exist some > reasonable algorithm for detecting changes in the query-results. The > example that comes to mind here would be simple DESCRIBE queries. Since it > seems that ARQ deals with DESCRIBE using only relationships "outbound" from > the things described, this approach could use an expiring map from URIs to > Etags which could be updated (perhaps using a StatementListener) when a > change directly affects an URI or a blank node in the CBD of that URI. This > could be expensive, but it might be worth it for some use cases, for > example where integrators are using software like Pubby to publish RDF. > There might be other examples of query pattern where changes are > practically calculable. > > Whether (and how far) any of these are worth pursuing depends a good bit > on the use case in hand. For example, for my use cases, option 2 isn't > really practical, because one of the applications taking results from > Fuseki would be using them to present live-editing pages. Option 1 would > work, and it would give some advantage. Option 4 isn't interesting because > very few of the queries in play will be simple DESRIBE queries. But that's > all based on my use case. > > Do you think any of these are worth pursuing? > > --- > A. Soroka > The University of Virginia Library > > -- I like: Like Like - The likeliest place on the web <http://like-like.xenei.com> LinkedIn: http://www.linkedin.com/in/claudewarren
