Re: Should scan check the limitation of the number of versions?

tobe Mon, 25 Aug 2014 18:37:26 -0700

@andrew Actually I don't want to see row in TIMERANGE => [0, 150] because
it's the overdue version. Should I set {KEEP_DELETED_CELLS => 'true'}? My
problem is that even though I don't keep deleted cells, I will get the
result which is not what I expect.



On Tue, Aug 26, 2014 at 9:24 AM, Andrew Purtell <[email protected]> wrote:

> On Mon, Aug 25, 2014 at 6:13 PM, tobe <[email protected]> wrote:
>
> > @lars I have set {KEEP_DELETED_CELLS => 'false'} in that table. I will
> get
> > the same result before manually running `flush`. You can try the
> commands I
> > gave and it's 100% repro.
> >
>
> You need KEEP_DELETED_CELLS => 'true'. 
>
>
>
> On Mon, Aug 25, 2014 at 6:13 PM, tobe <[email protected]> wrote:
>
> > @lars I have set {KEEP_DELETED_CELLS => 'false'} in that table. I will
> get
> > the same result before manually running `flush`. You can try the
> commands I
> > gave and it's 100% repro.
> >
> >
> > On Tue, Aug 26, 2014 at 2:20 AM, lars hofhansl <[email protected]> wrote:
> >
> > > Queries of past time ranges only work correctly when KEEP_DELETED_CELLS
> > is
> > > enabled for the column families.
> > >
> > >
> > > ________________________________
> > >  From: tobe <[email protected]>
> > > To: hbase-dev <[email protected]>
> > > Cc: "[email protected]" <[email protected]>
> > > Sent: Monday, August 25, 2014 4:32 AM
> > > Subject: Re: Should scan check the limitation of the number of
> versions?
> > >
> > >
> > > I haven't read the code deeply but I have an idea(not sure whether it's
> > > right or not). When we scan the the columns, we will skip the one which
> > > doesn't match(deleted). Can we use a counter to record this? For each
> > skip,
> > > we add one until it reaches the restrictive number of versions. But we
> > have
> > > to consider mvcc and others, which seems more complex.
> > >
> > >
> > >
> > >
> > >
> > > On Mon, Aug 25, 2014 at 5:54 PM, tobe <[email protected]> wrote:
> > >
> > > > So far, I have found two problems about this.
> > > >
> > > > Firstly, HBase-11675 <
> > https://issues.apache.org/jira/browse/HBASE-11675
> > > >.
> > > > It's a little tricky and rarely happens. But it asks users to be
> > careful
> > > of
> > > > compaction which occurs on server side. They may get different
> results
> > > > before and after the major compaction.
> > > >
> > > > Secondly, if you put a value with timestamp 100, then put another
> value
> > > on
> > > > the same column with timestamp 200. Here we set the number of version
> > as
> > > 1.
> > > > So when we get the value of this column, we will get the latest one
> > with
> > > > timestamp 200 and that's right. But if I get with a timerange form 0
> to
> > > > 150, I may get the first value with timestamp 100 before compaction
> > > > happens. And after compaction happens, you will never get this value
> > even
> > > > you run the same command.
> > > >
> > > > It's easy to repro, follow this steps:
> > > > hbase(main):001:0> create "table", "cf"
> > > > hbase(main):003:0> put "table", "row1", "cf:a", "value1", 100
> > > > hbase(main):003:0> put "table", "row1", "cf:a", "value1", 200
> > > > hbase(main):026:0> get "table", "row1", {TIMERANGE => [0, 150]}  //
> > > before
> > > > flush
> > > >    row1      column=cf:a, timestamp=100, value=value1
> > > > hbase(main):060:0> flush "table"
> > > > hbase(main):082:0> get "table", "row1", {TIMERANGE => [0, 150]}  //
> > after
> > > > flush
> > > >    0 row(s) in 0.0050 seconds
> > > >
> > > > I think the reason of that is we have three restriction to remove
> data:
> > > > delete, ttl and versions. Any time we get or scan the data, we will
> > check
> > > > the delete mark and ttl to make sure it will not return to users. But
> > for
> > > > versions, we don't check this limitation. Our output relies on the
> > > > compaction to cleanup the overdue data. Is it possible to add this
> > > > condition within scan(get is implemented as scan)?
> > > >
> > >
> >
>
>
>
> --
> Best regards,
>
>    - Andy
>
> Problems worthy of attack prove their worth by hitting back. - Piet Hein
> (via Tom White)
>

Re: Should scan check the limitation of the number of versions?

Reply via email to