Hello, Kevin-san Yes, Hadoop DFS maintains three copies of the same data (version) at the file system level. What I'm wondering about is the necessity of different versions of cells by HBase at the database level. Amazon SimpleDB, Microsoft Azure Table, and Google App Engine Datastore do not provide versioning. So I felt that many people do not have to use versioning and the default maximum versions of HBase had better be 1.
Regards Takayuki ----- Original Message ----- From: "Kevin Apte" <technicalarchitect2...@gmail.com> To: <hbase-user@hadoop.apache.org> Sent: Friday, May 07, 2010 1:51 PM Subject: Re: How is column timestamp useful? > Hadoop philosophy is to deploy on low cost disks and keep 3 copies of data > for redundancy. This ensures that the costs are very low- perhaps 5 to 10 > times lower than what large Enterprises are paying for expensive SAN > configurations. > > This does not mean one needs to waste storage- If you store files > compressed using gZip, multiple versions of a row may compress very well. > > Kevin > > > > On Fri, May 7, 2010 at 10:14 AM, tsuna <tsuna...@gmail.com> wrote: > >> In addition to what Ryan said, even if the default maximum number of >> versions for a cell is 3 doesn't mean that you end up wasting space. >> If you only ever write one version, that's what you end up paying for. >> >> -- >> Benoit "tsuna" Sigoure >> Software Engineer @ www.StumbleUpon.com >> >