If there were an ETag per dataset and a method on the dataset to force an
ETag reset would this address the index issue in that the indexer could
reset the ETag when it deemed appropriate?

In any case I would go with the first choice.

Is there anything that prohibits sending both an ETag and a constant
expires?  I havn't looked but I recall they are not mutually exclusive.

Claude

On Mon, Jun 29, 2015 at 2:04 PM, A. Soroka <[email protected]> wrote:

> A quick discussion of ETags in the "backup admin" PR that was sent by Yang
> Yuanzhe led me to this issue:
>
> https://issues.apache.org/jira/browse/JENA-388
>
> for "Make Fuseki responses cacheable" and which has been around for a
> little while. I was wondering about a couple of potential approaches here
> and thought I would run them down:
>
> 1) ETag-per-Dataset: this is a single ETag value for any Dataset for all
> requests, updated whenever a mutating request completes. This would work by
> letting any change on a Dataset whatsoever that comes through Fuseki
> invalidate all ETag-based caching on that Dataset. This seems to be where
> Andy Seaborne and Rob Vesse were heading, but I obviously can't speak for
> them. Advantage: relatively simple. Disadvantages: changes in the indexes
> not performed by Fuseki will not be reflected properly, only useful for
> instances that receive the right patterns of changes (meaning for which
> mutations aren't too "evenly sprinkled" amongst queries, thus keeping the
> cache often invalidated).
>
> 2) Constant Expires: Rob Vesse discusses this a bit in the issue. It's an
> Expires header that is configurable to allow some admin adjustment, but is
> constant during runtime. Advantage: dead simple. Disadvantage: unless the
> usage scenario is very tightly controlled, there's going to be some leakage
> of stale data. That may or may not be a big problem for an integrator,
> depending on use case. It would have to be carefully documented, I think,
> to avoid nasty surprises.
>
> 3) Per-query ETag: This would be mean some kind of map from request to
> ETag from which ETag headers are supplied for every request. The problem
> with this is that it implies some kind of reasonable algorithm for
> determining when an arbitrary update makes sufficient changes in an
> arbitrary graph to affect another arbitrary query, or it would imply
> stretching the meaning of "weak" ETag to a point that is probably not
> useful or correct for a query endpoint. This doesn't seem very practical.
>
> 4) Per-query-for-some-queries ETag. The idea here would be to cut down
> option 3 to a tranche of queries for which there actually _does_ exist some
> reasonable algorithm for detecting changes in the query-results. The
> example that comes to mind here would be simple DESCRIBE queries. Since it
> seems that ARQ deals with DESCRIBE using only relationships "outbound" from
> the things described, this approach could use an expiring map from URIs to
> Etags which could be updated (perhaps using a StatementListener) when a
> change directly affects an URI or a blank node in the CBD of that URI. This
> could be expensive, but it might be worth it for some use cases, for
> example where integrators are using software like Pubby to publish RDF.
> There might be other examples of query pattern where changes are
> practically calculable.
>
> Whether (and how far) any of these are worth pursuing depends a good bit
> on the use case in hand. For example, for my use cases, option 2 isn't
> really practical, because one of the applications taking results from
> Fuseki would be using them to present live-editing pages. Option 1 would
> work, and it would give some advantage. Option 4 isn't interesting because
> very few of the queries in play will be simple DESRIBE queries. But that's
> all based on my use case.
>
> Do you think any of these are worth pursuing?
>
> ---
> A. Soroka
> The University of Virginia Library
>
>


-- 
I like: Like Like - The likeliest place on the web
<http://like-like.xenei.com>
LinkedIn: http://www.linkedin.com/in/claudewarren

Reply via email to