Hadoop philosophy is to deploy on low cost disks and keep 3 copies of data for redundancy. This ensures that the costs are very low- perhaps 5 to 10 times lower than what large Enterprises are paying for expensive SAN configurations.
This does not mean one needs to waste storage- If you store files compressed using gZip, multiple versions of a row may compress very well. Kevin On Fri, May 7, 2010 at 10:14 AM, tsuna <tsuna...@gmail.com> wrote: > In addition to what Ryan said, even if the default maximum number of > versions for a cell is 3 doesn't mean that you end up wasting space. > If you only ever write one version, that's what you end up paying for. > > -- > Benoit "tsuna" Sigoure > Software Engineer @ www.StumbleUpon.com >