Re: [rfc] File::Corruption

2004-10-18 Thread Joshua Hoblitt
On Sun, 17 Oct 2004, Christopher Hicks wrote:

 On Sun, 17 Oct 2004, Joshua Hoblitt wrote:
  is the namespace appropriate?
 
 I'd rather see it called something like File::DetectCorruption or 
 something that makes it clear that your module isn't here to corrupt 
 files.

That seems like a little too much typing for my tastes.  File::Corruption,
File::CheckSum or the like sounds better to me. 

 I've had good luck with SATA, but I don't use RAID controllers since I'd 
 rather put the money into more drives and let Linux do the RAID.  My 
 desktop box is an Opteron/SATA/Linux/LVM box with Linux doing RAID1 across 
 two drives and its absolutely fabulous.  :)

The Linux md driver is fine for desktops or data that you don't really care
about.  However, as of vanilla 2.6.8, it does not support bad block remapping.
That means it is easy (even trivial) to lose data.

Lets take your RAID1 setup as an example, I'll assume that you have 2 disks in
your array.  Say that you have a block go bad in the middle of a some data.
Assuming this fault is detectable, which will only happen if the disk is
physically unable to read the block as there is no checksumming under RAID1,
the md driver will read the data off of the other disk.  Now, lets suppose that
the other disk dies.  Ouch - that corruption is non-recoverable.

If you had been using a hardware RAID controller that remaps bad blocks on the
fly (like 3Ware controllers).  The first time that bad block was encountered
the controller would have marked the physical block as bad and recovered the
data from the good disk.  The downside is most hardware SATA RAID controllers
have pretty pathetic write performance.

Something else to worry about is that most hard disks have a read bit error
rate of around 1 in 10^14 bits.  These miss-reads will be completely missed by
RAID1 but caught and corrected by RAID5 (even in software).

For most people these issues are so trivial as to not even warrant
consideration.  However, once you get into the 10+ terrabyte range or have data
that you *really* care about the integrity of these are some of the issues that
you have to worry about.  I have over a petabyte of data to worry about and I
*really* care about data integrity.  File::Corruption is just a small
userland piece of a larger data redundancy and integrity plan.

Cheers,

-J

--


Re: [rfc] File::Corruption

2004-10-18 Thread Randy W. Sims
Joshua Hoblitt wrote:
On Sun, 17 Oct 2004, Christopher Hicks wrote:

On Sun, 17 Oct 2004, Joshua Hoblitt wrote:
is the namespace appropriate?
I'd rather see it called something like File::DetectCorruption or 
something that makes it clear that your module isn't here to corrupt 
files.

That seems like a little too much typing for my tastes.  File::Corruption,
File::CheckSum or the like sounds better to me. 
File::Check
File::Verify
File::Validate
???
http://thesaurus.reference.com/search?q=check


Re: [rfc] File::Corruption

2004-10-18 Thread Joshua Hoblitt
On Sun, 17 Oct 2004, Joshua Hoblitt wrote:

 On Mon, 18 Oct 2004, Randy W. Sims wrote:
 
  Joshua Hoblitt wrote:
   On Sun, 17 Oct 2004, Christopher Hicks wrote:
   
   
  On Sun, 17 Oct 2004, Joshua Hoblitt wrote:
  
  is the namespace appropriate?
  
  I'd rather see it called something like File::DetectCorruption or 
  something that makes it clear that your module isn't here to corrupt 
  files.
   
   
   That seems like a little too much typing for my tastes.  File::Corruption,
   File::CheckSum or the like sounds better to me. 
  
  File::Check
  File::Verify
  File::Validate
  
  ???
 
 File::Health 
 File::Monitor
 
 ???
 
 File::Paranoid 
 
 :)

This is sort of side note but I really like File::Integrity.  Although, I
sort of feel that namespace should be a module that will check the integrity of
any file where as my code is somewhat more specialized.  Perhaps a long term
goal will be to factor File::Integrity out of my code but I still need to
find a suitable namespace for my more specialized code.

-J

--


Re: [rfc] File::Corruption

2004-10-18 Thread Chris Josephes
How does the code check the integrity of the file?  I mean, is there any
hardware/driver code thrown in here?  Is it specific for systems using
SATA controllers??  That might affect namespace ideas.

I liked the File::Integrity suggestion, but I'd want to know more.

On Sun, 17 Oct 2004, Joshua Hoblitt wrote:

 Hi Folks,

 This is a module that I wrote for in-house use as I am somewhat apprehensive
 about the reliability of low-end SATA raid controllers.  Admittedly, this
 module scratches a rather niche itch.  My two questions are a) is this
 functionality general enough to warrant placing it on CPAN and b) is the
 namespace appropriate?

 Cheers,

 -J

 --
 NAME
 File::Corruption - Detect file corruption

 SYNOPSIS
 use File::Corruption;
 use File::Find::Rule;

 my $checker= File::Corruption-new(
 db  = './test.yml',
 verbose = 1,
 autoflush   = 1,
 );

 my $checker2 = $checker-clone;

 my @files = File::Find::Rule-file-name( '*' )-in( . );
 my $added   = $checker-add( [EMAIL PROTECTED] );
 my $bad = $checker-check( [EMAIL PROTECTED] );
 my $deleted = $checker-delete( [EMAIL PROTECTED] );

 print has file\n if $checker-has( qw( foo ) );

 $checker-save;

 DESCRIPTION
 This module attempts to detect file corruption caused by errors in the
 storage medium. The design philosophy is very different from intrusion
 detection systems like Tripwire and AIDE. While both of those well known
 systems will detect and report file corruption, they will also detect
 (and report) almost *any* file modification. In contrast this module
 attempts to *stay out of your face* by ignoring intentional file
 modification and only reporting files that have had bit values
 *silently* changed.

 File corruption is detected by recording a file's mtime and it's SHA1
 checksum into a persistent database. The next time a file is inspected
 by check the file's current mtime is compared to the value stored in
 the database. If the mtimes are the same but the checksum has changed
 then the file is said to be corrupted. If the mtimes are different
 then the new mtime and checksum are recorded to the database.
 Obvously, this technique is NOT suitable for intrusion detection.

 USAGE
   Import Parameters
 This module accepts no arguments to it's import method and exports no
 *symbols*.

   Methods
Constructors
 * new(...)
 Accepts a mandatory hash and returns a File::Corruption object.

 my $checker = File::Corruption-new(
 db  = './foo.yml',
 verbose = 1,
 autoflush   = 1
 );

 * db
 A file path to either a pre-existing File::Corruption YAML
 database or a location where a new database can be created. A
 pre-existing database must be writable. If a path to a new
 database is specified the directory must already exist (new
 directories will not be automatically created) and have
 permissions that allow file creation.

 * verbose
 A boolean value (0, 1, undef). Causes corrupt, non-existent, and
 non-plain files to be reported to the STDERR.

 This key is optional.

 * autoflush
 A boolean value (0, 1, undef). When set to true, check will
 flush any files from the database that were not passed in to be
 tested. This behavior is on a per invocation basis.

 This key is optional.

 * clone
 This object method returns a replica of the given object.

Object Methods
 * add
 Accepts either a filename or an arrayref to filenames that will be
 added to the File::Corruption database.

 Returns a list of File::Corruption::Stat objects representing files
 actually added to the database. In scalar context returns either an
 arrayref to File::Corruption::Stat objects or undef if no files were
 added.

 * check
 Accepts either a filename or an arrayref to filenames that will be
 checked against the File::Corruption database. Filenames that don't
 already exist in the database will be automatically added.

 Returns a list of File::Corruption::Detected objects representing
 files that are suspected to have been corrupted. In scalar context
 returns either an arrayref to File::Corruption::Detected objects or
 undef if no corrupt files were detected.

 * delete
 Accepts either a filename or an arrayref to filenames that will be
 deleted from the File::Corruption database.

 Returns a list of File::Corruption::Stat objects representing the
 files actually deleted from the database. In scalar context returns
 

Re: [rfc] File::Corruption

2004-10-18 Thread Lincoln A. Baxter
On Mon, 2004-10-18 at 05:02, Joshua Hoblitt wrote:
 This is sort of side note but I really like File::Integrity.  Although, I
 sort of feel that namespace should be a module that will check the integrity of
 any file where as my code is somewhat more specialized.  Perhaps a long term
 goal will be to factor File::Integrity out of my code but I still need to
 find a suitable namespace for my more specialized code.
 

Ok, how about File::SATA::Integrity



Re: [rfc] File::Corruption

2004-10-18 Thread Christopher Hicks
On Mon, 18 Oct 2004, Lincoln A. Baxter wrote:
Ok, how about File::SATA::Integrity
His motivation was SATA, but the resulting solution isn't SATA specific.
--
/chris
Documentation is like sex: when it is good, it is very, very good;
and when it is bad, it is better than nothing.  -- Dick Brandon