On Mon, Aug 25, 2014 at 6:13 PM, tobe <[email protected]> wrote: > @lars I have set {KEEP_DELETED_CELLS => 'false'} in that table. I will get > the same result before manually running `flush`. You can try the commands I > gave and it's 100% repro. >
You need KEEP_DELETED_CELLS => 'true'. On Mon, Aug 25, 2014 at 6:13 PM, tobe <[email protected]> wrote: > @lars I have set {KEEP_DELETED_CELLS => 'false'} in that table. I will get > the same result before manually running `flush`. You can try the commands I > gave and it's 100% repro. > > > On Tue, Aug 26, 2014 at 2:20 AM, lars hofhansl <[email protected]> wrote: > > > Queries of past time ranges only work correctly when KEEP_DELETED_CELLS > is > > enabled for the column families. > > > > > > ________________________________ > > From: tobe <[email protected]> > > To: hbase-dev <[email protected]> > > Cc: "[email protected]" <[email protected]> > > Sent: Monday, August 25, 2014 4:32 AM > > Subject: Re: Should scan check the limitation of the number of versions? > > > > > > I haven't read the code deeply but I have an idea(not sure whether it's > > right or not). When we scan the the columns, we will skip the one which > > doesn't match(deleted). Can we use a counter to record this? For each > skip, > > we add one until it reaches the restrictive number of versions. But we > have > > to consider mvcc and others, which seems more complex. > > > > > > > > > > > > On Mon, Aug 25, 2014 at 5:54 PM, tobe <[email protected]> wrote: > > > > > So far, I have found two problems about this. > > > > > > Firstly, HBase-11675 < > https://issues.apache.org/jira/browse/HBASE-11675 > > >. > > > It's a little tricky and rarely happens. But it asks users to be > careful > > of > > > compaction which occurs on server side. They may get different results > > > before and after the major compaction. > > > > > > Secondly, if you put a value with timestamp 100, then put another value > > on > > > the same column with timestamp 200. Here we set the number of version > as > > 1. > > > So when we get the value of this column, we will get the latest one > with > > > timestamp 200 and that's right. But if I get with a timerange form 0 to > > > 150, I may get the first value with timestamp 100 before compaction > > > happens. And after compaction happens, you will never get this value > even > > > you run the same command. > > > > > > It's easy to repro, follow this steps: > > > hbase(main):001:0> create "table", "cf" > > > hbase(main):003:0> put "table", "row1", "cf:a", "value1", 100 > > > hbase(main):003:0> put "table", "row1", "cf:a", "value1", 200 > > > hbase(main):026:0> get "table", "row1", {TIMERANGE => [0, 150]} // > > before > > > flush > > > row1 column=cf:a, timestamp=100, value=value1 > > > hbase(main):060:0> flush "table" > > > hbase(main):082:0> get "table", "row1", {TIMERANGE => [0, 150]} // > after > > > flush > > > 0 row(s) in 0.0050 seconds > > > > > > I think the reason of that is we have three restriction to remove data: > > > delete, ttl and versions. Any time we get or scan the data, we will > check > > > the delete mark and ttl to make sure it will not return to users. But > for > > > versions, we don't check this limitation. Our output relies on the > > > compaction to cleanup the overdue data. Is it possible to add this > > > condition within scan(get is implemented as scan)? > > > > > > -- Best regards, - Andy Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White)
