On 20 Sep 2010, at 12:52 PM, Niklas Edmundsson wrote:

As we cache files from an nfs mount, we hash on device:inode as a simple method of reducing duplicates of files (say a dozen URL:s all resolving to the same DVD image). We see a huge benefit of being able to do this as we get a grotesque amount of data duplication otherwise.

So we usually have multiple header files all pointing to the same data file.

For the more generic cache it might also be useful provided that you have a mechanism to identify duplicated data, the only thing I can think of is hashing on the data block but that isn't really feasible for large files. I suspect there might be cases where there exists usecases with a backend that can provide hints for this though.

I think this use case is bordering on something that would need to be in it's own module, rather than trying to stretch mod_disk_cache to be aware of FILE buckets. Something like mod_diskfile_cache (or something, mod_file_cache already exists and probably should have been called mod_fd_cache, but oh well).

Hmmm...

I notice the interface for create_entity() in the cache provider doesn't pass the output bucket brigade through to the provider.

This would be useful in this case, because a dedicated file caching provider module might want to look inside the brigade to see if it contains a single FILE bucket, and if not, to DECLINE the request to cache.

Does such a change sound sensible?

    int (*create_entity) (cache_handle_t *h, request_rec *r,
const char *urlkey, apr_off_t len, apr_bucket_brigade *bb);

Regards,
Graham
--

Reply via email to