Re: [rfc] File::Corruption
On Sun, 17 Oct 2004, Christopher Hicks wrote: On Sun, 17 Oct 2004, Joshua Hoblitt wrote: is the namespace appropriate? I'd rather see it called something like File::DetectCorruption or something that makes it clear that your module isn't here to corrupt files. That seems like a little too much typing for my tastes. File::Corruption, File::CheckSum or the like sounds better to me. I've had good luck with SATA, but I don't use RAID controllers since I'd rather put the money into more drives and let Linux do the RAID. My desktop box is an Opteron/SATA/Linux/LVM box with Linux doing RAID1 across two drives and its absolutely fabulous. :) The Linux md driver is fine for desktops or data that you don't really care about. However, as of vanilla 2.6.8, it does not support bad block remapping. That means it is easy (even trivial) to lose data. Lets take your RAID1 setup as an example, I'll assume that you have 2 disks in your array. Say that you have a block go bad in the middle of a some data. Assuming this fault is detectable, which will only happen if the disk is physically unable to read the block as there is no checksumming under RAID1, the md driver will read the data off of the other disk. Now, lets suppose that the other disk dies. Ouch - that corruption is non-recoverable. If you had been using a hardware RAID controller that remaps bad blocks on the fly (like 3Ware controllers). The first time that bad block was encountered the controller would have marked the physical block as bad and recovered the data from the good disk. The downside is most hardware SATA RAID controllers have pretty pathetic write performance. Something else to worry about is that most hard disks have a read bit error rate of around 1 in 10^14 bits. These miss-reads will be completely missed by RAID1 but caught and corrected by RAID5 (even in software). For most people these issues are so trivial as to not even warrant consideration. However, once you get into the 10+ terrabyte range or have data that you *really* care about the integrity of these are some of the issues that you have to worry about. I have over a petabyte of data to worry about and I *really* care about data integrity. File::Corruption is just a small userland piece of a larger data redundancy and integrity plan. Cheers, -J --
Re: [rfc] File::Corruption
Joshua Hoblitt wrote: On Sun, 17 Oct 2004, Christopher Hicks wrote: On Sun, 17 Oct 2004, Joshua Hoblitt wrote: is the namespace appropriate? I'd rather see it called something like File::DetectCorruption or something that makes it clear that your module isn't here to corrupt files. That seems like a little too much typing for my tastes. File::Corruption, File::CheckSum or the like sounds better to me. File::Check File::Verify File::Validate ??? http://thesaurus.reference.com/search?q=check
Re: [rfc] File::Corruption
On Sun, 17 Oct 2004, Joshua Hoblitt wrote: On Mon, 18 Oct 2004, Randy W. Sims wrote: Joshua Hoblitt wrote: On Sun, 17 Oct 2004, Christopher Hicks wrote: On Sun, 17 Oct 2004, Joshua Hoblitt wrote: is the namespace appropriate? I'd rather see it called something like File::DetectCorruption or something that makes it clear that your module isn't here to corrupt files. That seems like a little too much typing for my tastes. File::Corruption, File::CheckSum or the like sounds better to me. File::Check File::Verify File::Validate ??? File::Health File::Monitor ??? File::Paranoid :) This is sort of side note but I really like File::Integrity. Although, I sort of feel that namespace should be a module that will check the integrity of any file where as my code is somewhat more specialized. Perhaps a long term goal will be to factor File::Integrity out of my code but I still need to find a suitable namespace for my more specialized code. -J --
Re: [rfc] File::Corruption
How does the code check the integrity of the file? I mean, is there any hardware/driver code thrown in here? Is it specific for systems using SATA controllers?? That might affect namespace ideas. I liked the File::Integrity suggestion, but I'd want to know more. On Sun, 17 Oct 2004, Joshua Hoblitt wrote: Hi Folks, This is a module that I wrote for in-house use as I am somewhat apprehensive about the reliability of low-end SATA raid controllers. Admittedly, this module scratches a rather niche itch. My two questions are a) is this functionality general enough to warrant placing it on CPAN and b) is the namespace appropriate? Cheers, -J -- NAME File::Corruption - Detect file corruption SYNOPSIS use File::Corruption; use File::Find::Rule; my $checker= File::Corruption-new( db = './test.yml', verbose = 1, autoflush = 1, ); my $checker2 = $checker-clone; my @files = File::Find::Rule-file-name( '*' )-in( . ); my $added = $checker-add( [EMAIL PROTECTED] ); my $bad = $checker-check( [EMAIL PROTECTED] ); my $deleted = $checker-delete( [EMAIL PROTECTED] ); print has file\n if $checker-has( qw( foo ) ); $checker-save; DESCRIPTION This module attempts to detect file corruption caused by errors in the storage medium. The design philosophy is very different from intrusion detection systems like Tripwire and AIDE. While both of those well known systems will detect and report file corruption, they will also detect (and report) almost *any* file modification. In contrast this module attempts to *stay out of your face* by ignoring intentional file modification and only reporting files that have had bit values *silently* changed. File corruption is detected by recording a file's mtime and it's SHA1 checksum into a persistent database. The next time a file is inspected by check the file's current mtime is compared to the value stored in the database. If the mtimes are the same but the checksum has changed then the file is said to be corrupted. If the mtimes are different then the new mtime and checksum are recorded to the database. Obvously, this technique is NOT suitable for intrusion detection. USAGE Import Parameters This module accepts no arguments to it's import method and exports no *symbols*. Methods Constructors * new(...) Accepts a mandatory hash and returns a File::Corruption object. my $checker = File::Corruption-new( db = './foo.yml', verbose = 1, autoflush = 1 ); * db A file path to either a pre-existing File::Corruption YAML database or a location where a new database can be created. A pre-existing database must be writable. If a path to a new database is specified the directory must already exist (new directories will not be automatically created) and have permissions that allow file creation. * verbose A boolean value (0, 1, undef). Causes corrupt, non-existent, and non-plain files to be reported to the STDERR. This key is optional. * autoflush A boolean value (0, 1, undef). When set to true, check will flush any files from the database that were not passed in to be tested. This behavior is on a per invocation basis. This key is optional. * clone This object method returns a replica of the given object. Object Methods * add Accepts either a filename or an arrayref to filenames that will be added to the File::Corruption database. Returns a list of File::Corruption::Stat objects representing files actually added to the database. In scalar context returns either an arrayref to File::Corruption::Stat objects or undef if no files were added. * check Accepts either a filename or an arrayref to filenames that will be checked against the File::Corruption database. Filenames that don't already exist in the database will be automatically added. Returns a list of File::Corruption::Detected objects representing files that are suspected to have been corrupted. In scalar context returns either an arrayref to File::Corruption::Detected objects or undef if no corrupt files were detected. * delete Accepts either a filename or an arrayref to filenames that will be deleted from the File::Corruption database. Returns a list of File::Corruption::Stat objects representing the files actually deleted from the database. In scalar context returns
Re: [rfc] File::Corruption
On Mon, 2004-10-18 at 05:02, Joshua Hoblitt wrote: This is sort of side note but I really like File::Integrity. Although, I sort of feel that namespace should be a module that will check the integrity of any file where as my code is somewhat more specialized. Perhaps a long term goal will be to factor File::Integrity out of my code but I still need to find a suitable namespace for my more specialized code. Ok, how about File::SATA::Integrity
Re: [rfc] File::Corruption
On Mon, 18 Oct 2004, Lincoln A. Baxter wrote: Ok, how about File::SATA::Integrity His motivation was SATA, but the resulting solution isn't SATA specific. -- /chris Documentation is like sex: when it is good, it is very, very good; and when it is bad, it is better than nothing. -- Dick Brandon