On 20 Sep 2010, at 12:52 PM, Niklas Edmundsson wrote:
As we cache files from an nfs mount, we hash on device:inode as a
simple method of reducing duplicates of files (say a dozen URL:s all
resolving to the same DVD image). We see a huge benefit of being
able to do this as we get a grotesque amount of data duplication
otherwise.
So we usually have multiple header files all pointing to the same
data file.
For the more generic cache it might also be useful provided that you
have a mechanism to identify duplicated data, the only thing I can
think of is hashing on the data block but that isn't really feasible
for large files. I suspect there might be cases where there exists
usecases with a backend that can provide hints for this though.
I think this use case is bordering on something that would need to be
in it's own module, rather than trying to stretch mod_disk_cache to be
aware of FILE buckets. Something like mod_diskfile_cache (or
something, mod_file_cache already exists and probably should have been
called mod_fd_cache, but oh well).
Hmmm...
I notice the interface for create_entity() in the cache provider
doesn't pass the output bucket brigade through to the provider.
This would be useful in this case, because a dedicated file caching
provider module might want to look inside the brigade to see if it
contains a single FILE bucket, and if not, to DECLINE the request to
cache.
Does such a change sound sensible?
int (*create_entity) (cache_handle_t *h, request_rec *r,
const char *urlkey, apr_off_t len,
apr_bucket_brigade *bb);
Regards,
Graham
--