> We've got a number of applications that require scalable storage, that have > different front end and business requirements but often end up containing the > same files (largely images). > > We're considering using mogilefs as a storage solution, using a md5 sum (or > SHA-xx) of the file's contents as the key. This key would be stored by each > application in their own databases along with all the metainformation which > is application dependent. > > This would provide a guarantee* that we were never 'wasting' storage by > storing the same file multiple times, without making major changes to our > applications. > > Has any body used a function of the file contents as the key before? Good > idea/bad idea?
The FotoBilder software (powers PicPix.com and LiveJournal's ScrapBook service) uses this concept. Although it doesn't use the hash as the MogileFS key, it uses the hash to reference a record in the database, where it can then look up the picture's ID. That helps it keep track of how many people are using this picture so it can be cleaned up later, too. It's a fine idea, it will work great either way you do it. :) -- Mark Smith / xb95 [EMAIL PROTECTED]
