If compression is used overhead of versioning is not significant. Many people want versioning of data for many reasons- including auditing and compliance. In some database systems, analyzing data is effective only if performed on the same version.
I agree that if there is no need, versioning should be turned off. Kevin On Fri, May 7, 2010 at 10:49 AM, Takayuki Tsunakawa < tsunakawa.ta...@jp.fujitsu.com> wrote: > Hello, Kevin-san > > Yes, Hadoop DFS maintains three copies of the same data (version) at > the file system level. What I'm wondering about is the necessity of > different versions of cells by HBase at the database level. > Amazon SimpleDB, Microsoft Azure Table, and Google App Engine > Datastore do not provide versioning. So I felt that many people do not > have to use versioning and the default maximum versions of HBase had > better be > > Regards > Takayuki > > > ----- Original Message ----- > From: "Kevin Apte" <technicalarchitect2...@gmail.com> > To: <hbase-user@hadoop.apache.org> > Sent: Friday, May 07, 2010 1:51 PM > Subject: Re: How is column timestamp useful? > > > > Hadoop philosophy is to deploy on low cost disks and keep 3 copies > of data > > for redundancy. This ensures that the costs are very low- perhaps 5 > to 10 > > times lower than what large Enterprises are paying for expensive SAN > > configurations. > > > > This does not mean one needs to waste storage- If you store files > > compressed using gZip, multiple versions of a row may compress very > well. > > > > Kevin > > > > > > > > On Fri, May 7, 2010 at 10:14 AM, tsuna <tsuna...@gmail.com> wrote: > > > >> In addition to what Ryan said, even if the default maximum number > of > >> versions for a cell is 3 doesn't mean that you end up wasting > space. > >> If you only ever write one version, that's what you end up paying > for. > >> > >> -- > >> Benoit "tsuna" Sigoure > >> Software Engineer @ www.StumbleUpon.com > >> > > > > >