If compression is used overhead of versioning is not significant. Many
people want versioning of data for many reasons- including auditing and
compliance. In some database systems, analyzing data is effective only if
performed on the same version.

I agree that if there is no need,  versioning should be turned off.

Kevin



On Fri, May 7, 2010 at 10:49 AM, Takayuki Tsunakawa <
tsunakawa.ta...@jp.fujitsu.com> wrote:

> Hello, Kevin-san
>
> Yes, Hadoop DFS maintains three copies of the same data (version) at
> the file system level. What I'm wondering about is the necessity of
> different versions of cells by HBase at the database level.
> Amazon SimpleDB, Microsoft Azure Table, and Google App Engine
> Datastore do not provide versioning. So I felt that many people do not
> have to use versioning and the default maximum versions of HBase had
> better be
>
> Regards
> Takayuki
>
>
> ----- Original Message -----
> From: "Kevin Apte" <technicalarchitect2...@gmail.com>
> To: <hbase-user@hadoop.apache.org>
> Sent: Friday, May 07, 2010 1:51 PM
> Subject: Re: How is column timestamp useful?
>
>
> > Hadoop philosophy is to deploy on low cost disks and keep 3 copies
> of data
> > for redundancy. This ensures that the costs are very low- perhaps 5
> to 10
> > times lower than what large Enterprises are paying for expensive SAN
> > configurations.
> >
> > This does not mean one needs to waste storage-  If you store files
> > compressed using gZip, multiple versions of a row may compress very
> well.
> >
> > Kevin
> >
> >
> >
> > On Fri, May 7, 2010 at 10:14 AM, tsuna <tsuna...@gmail.com> wrote:
> >
> >> In addition to what Ryan said, even if the default maximum number
> of
> >> versions for a cell is 3 doesn't mean that you end up wasting
> space.
> >> If you only ever write one version, that's what you end up paying
> for.
> >>
> >> --
> >> Benoit "tsuna" Sigoure
> >> Software Engineer @ www.StumbleUpon.com
> >>
> >
>
>
>

Reply via email to