2016. október 3., hétfő 15:30:47 UTC+2 időpontban Jai Sharma a következőt írta: > > I have been looking to use Camlistore as my, uhh, personal storage for > life. > > Since I am using commodity hard drives with max capacity to dollar ratio, > let's just say the drives or not at the upper side of the reliability > spectrum. I am planning to counter this by periodically syncing blobs to > Backblaze B2 (they are the cheaper than amazon or google). But in order to > determine when it's time to change hard drives I want to write an integrity > check mechanism, which can alert me, via email, when my main local hard > drive starts having integrity failures. > > I think there is a TODO item for something similar to this. But I can > take this on myself, I am just not familiar enough with the code base. > > Ideally I would like to operate at the "dumb" blob level with no knowledge > of the graph structure, but I can't even find where the blobs are stored, > which scares me much. In my, possibly incorrect, understanding I think > Camlistore abstracts the blob storage policy (packed vs unpacked?), which > means the blobs can end up anywhere under the tree (at "blobPath" from the > server config) on disk. IMHO that's the incorrect approach, at least if > the storage policy is defaulted in code. At the very least the storage > policy should at least be able to be introspected from the tree at > "blobPath", somehow, so that the integrity check mechanism can enumerate > blobs directly from the directory tree with no knowledge of the Camlistore > source. This just removes another point of failure (potential bugs in the > Camlistore blob server mechanism). > > The "file" storage stores blobs under sha1/aa..ff/ directories, one file per blob. Corruption can be checked by comparing the sha1 of the file with its name - they must be equal. A missing file is harder, that needs understanding of the blobs. But for simple checking, this should be enough (all corruptions I've seen meant zeroed blocks/files, or unreadable directories).
The "diskpacked" storage stores blobs in pack-99999.blobs files, and camtool has a built-in checker for them (camtool reindex-diskpacked), which scans the blobs sequentially and checks whether they're in the diskpacked's index (diskpacked-index.leveldb). You can augment those files with PAR files, to allow in-place resurrection - but I don't know whether this would be enough: to be able to resurrect the file if 10% of it zeroes out, you'l need 10% additional space. But will that be sufficient? The only reasonable solution is to backup them to somewhere else, too. -- You received this message because you are subscribed to the Google Groups "Camlistore" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
