On Tue, 12 Jun 2012 16:51:42 -0600 Leif Hedstrom <zw...@apache.org> wrote:
> On 6/11/12 11:20 PM, Nick Kew wrote: > > On 12 Jun 2012, at 02:49, Leif Hedstrom wrote: > > > >> Another option that I've been pondering, which is more suitable for large > >> contents I think, is to kick off background requests for the full objects. > >> A > >> plugin that does something like: > >> > >> Thoughts? > > I seem to recollect discussing approaches to caching ranges recently > > (was it with you?) > > > > What you outline makes sense where the resource is much bigger than > > the requested range, and fetching the whole thing (in a rangeless > > request to the backend) would be too much overhead. But to enforce > > it on all range requests could be overkill. I wonder if there's a case > > for adding a heuristic to examine the client's ranges, and fetch > > the whole thing while the client waits UNLESS the number of > > bytes the client wants to skip exceeds some threshold - which > > then triggers what you describe? > > > > Yeah, I totally agree. There are some potential alternatives here, such as > fixed sized chunking of content perhaps. It's still a difficult problem to > solve optimally for every type of request. Your suggested heuristics > probably is reasonable for many cases, but what is the client asks for the > first 16KB, and we have no idea how long the request is (it could be e.b. > 512GB)? Do we defer dealing with it until we have collected enough data to > build an intelligent decision? > > Also, blindly caching every Range: request could potentially completely fill > the cache with responses that partially overlap (there are no restrictions > on how the client can form the Range requests :/). Are we at cross-purposes here? If the client requests the first 16kb, then a rangeless request to the backend fetches that 16k first, so the client can be satisfied while the proxy continues filling the cache. That requires decoupling the client and server requests. I wasn't suggesting caching any ranged request! Thinking about it, perhaps an optimal heuristic is, on receipt of a range request, to make two requests to the backend: the request as-is, and (if cacheable) a second background request for everything-but the range. The background request grabs a mutex so we don't duplicate requests to a URL, and when done it reassembles the entire response in cache. Any other ranged requests for the same URL coming while the URL is mutexed just runs without caching. That's actually looking a lot like your original proposal! -- Nick Kew