2016. október 3., hétfő 15:30:47 UTC+2 időpontban Jai Sharma a következőt 
írta:
>
> I have been looking to use Camlistore as my, uhh, personal storage for 
> life.
>
> Since I am using commodity hard drives with max capacity to dollar ratio, 
> let's just say the drives or not at the upper side of the reliability 
> spectrum.  I am planning to counter this by periodically syncing blobs to 
> Backblaze B2 (they are the cheaper than amazon or google).  But in order to 
> determine when it's time to change hard drives I want to write an integrity 
> check mechanism, which can alert me, via email, when my main local hard 
> drive starts having integrity failures.
>
> I think there is a TODO item for something similar to this.  But I can 
> take this on myself, I am just not familiar enough with the code base.
>
> Ideally I would like to operate at the "dumb" blob level with no knowledge 
> of the graph structure, but I can't even find where the blobs are stored, 
>  which scares me much.  In my, possibly incorrect, understanding I think 
> Camlistore abstracts the blob storage policy (packed vs unpacked?), which 
> means the blobs can end up anywhere under the tree (at "blobPath" from the 
> server config) on disk.   IMHO  that's the incorrect approach, at least if 
> the storage policy is defaulted in code.  At the very least the storage 
> policy should at least be able to be introspected from the tree at 
> "blobPath", somehow, so that the integrity check mechanism can enumerate 
> blobs directly from the directory tree with no knowledge of the Camlistore 
> source.  This just removes another point of failure (potential bugs in the 
> Camlistore blob server mechanism).
>
>
The "file" storage stores blobs under sha1/aa..ff/ directories, one file 
per blob. Corruption can be checked by comparing the sha1 of the file with 
its name - they must be equal.
A missing file is harder, that needs understanding of the blobs. But for 
simple checking, this should be enough (all corruptions I've seen meant 
zeroed blocks/files, or unreadable directories).

The "diskpacked" storage stores blobs in pack-99999.blobs files, and 
camtool has a built-in checker for them (camtool reindex-diskpacked), which 
scans the blobs sequentially and checks whether they're in the diskpacked's 
index (diskpacked-index.leveldb).

You can augment those files with PAR files, to allow in-place resurrection 
- but I don't know whether this would be enough: to be able to resurrect 
the file if 10% of it zeroes out, you'l need 10% additional space. But will 
that be sufficient? The only reasonable solution is to backup them to 
somewhere else, too.

-- 
You received this message because you are subscribed to the Google Groups 
"Camlistore" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to