Heya, We've got a number of applications that require scalable storage, that have different front end and business requirements but often end up containing the same files (largely images).
We're considering using mogilefs as a storage solution, using a md5 sum (or SHA-xx) of the file's contents as the key. This key would be stored by each application in their own databases along with all the metainformation which is application dependent. This would provide a guarantee* that we were never 'wasting' storage by storing the same file multiple times, without making major changes to our applications. Has any body used a function of the file contents as the key before? Good idea/bad idea? Cheers! Shez * OK hash collisions are always possible, so filelength:SHA-256 would be a better key.
