On Aug 3, 2010, at 9:12 AM, Eric Sammer wrote: <snip/> > > All of that said, what you're protecting against here is permanent loss of a > data center and human error. Disk, rack, and node level failures are already > handled by HDFS when properly configured.
You've forgotten a third cause of loss: undiscovered software bugs. The downside of spinning disks is one completely fatal bug can destroy all your data in about a minute (at my site, I famously deleted about 100TB in 10 minutes with a scratch-space cleanup script gone awry. That was one nasty bug). This is why we keep good backups. If you're very, very serious about archiving and have a huge budget, you would invest a few million into a tape silo at multiple sites, flip the write-protection tab on the tapes, eject them, and send them off to secure facilities. This isn't for everyone though :) Brian
smime.p7s
Description: S/MIME cryptographic signature
