On Tue, Jan 17, 2012 at 8:56 PM, lars hofhansl <[email protected]> wrote:

> The memstoreTS is used for visibility during an intra-row transaction.
> Are you proposing to do this only if the deletes/puts did not use the
> current time?
>
> The ability to define timestamps for all operations is crucial to HBase.
> o It ensures that HTable.batch works correctly (which reorders Deletes
> w.r.t. to Puts at the Region Server).
> o It ensures that replication works correctly.
> o many other scenarios
>
> If you do not use application defined timestamp the current time is used
> and everything works as expected.
> If you use application defined timestamps you are asking for a delete to
> be either in the future or the past, and you have to understand what that
> means.
> Maybe we should document the behavior better.
>

I guess I am saying that I *do* understand the current "delete with TS"
behavior, and I find the current implementation  unstable and
non-deterministic.  Documenting it more thoroughly does not make it less
quirky or more stable.  I propose fixing it along the lines suggested in
option B.  Karthik seems to agree.




>
> -- Lars
>
>
> ----- Original Message -----
> From: Karthik Ranganathan <[email protected]>
> To: "[email protected]" <[email protected]>; lars hofhansl <
> [email protected]>
> Cc:
> Sent: Tuesday, January 17, 2012 3:27 PM
> Subject: Re: Delete client API.
>
>
> @Srivas - totally agree that B is the correct thing to do.
>
> One way we have talked about implementing this is using the memstore ts.
> Every insert of a KV into the memstore is given a memstore-ts. These are
> persisted only till they are needed (to ensure read atomicity for
> scanners) and then that value is zeroed out on a subsequent compaction
> (saves space). If we retained the memstore-ts even beyond these
> compactions, we could get a deterministic order for the puts and deletes
> (first insert ts < del ts < second insert ts).
>
> Thanks
> Karthik
>
>
> On 1/17/12 2:14 PM, "M. C. Srivas" <[email protected]> wrote:
>
> >On Tue, Jan 17, 2012 at 10:07 AM, lars hofhansl <[email protected]>
> >wrote:
> >
> >> Yeah, it's confusing if one expects it to work like in a relational
> >> database.
> >> You can even do worse. If you by accident place a delete in the future
> >>all
> >> current inserts will be hidden until the next major compaction. :)
> >> I got confused about this myself just recently (see my mail on the
> >> dev-list).
> >>
> >>
> >> In the end this is a pretty powerful feature and core to how HBase works
> >> (not saying that is not confusing though).
> >>
> >>
> >> If one keeps the following two points in mind it makes more sense:
> >> 1. Delete just sets a tomb stone marker at a specific TS (marking
> >> everything older as deleted).
> >> 2. Everything is versioned, if no version is specified the current time
> >> (at the regionserver) is used.
> >>
> >> In your example1 below t3 > 6, hence the insert is hidden.
> >> In example2 both delete and insert TS are 6, hence the insert is hidden.
> >>
> >
> >Lets consider my example2 for a little longer. Sequence of events
> >
> >   1.  ins  val1  with TS=6 set by client
> >   2.  del  entire row at TS=6 set by client
> >   3.  ins  val2  with TS=6  set by client
> >   4.  read row
> >
> >The row returns nothing even though the insert at step 3 happened after
> >the
> >delete at step 2. (step 2 masks even future inserts)
> >
> >Now, the same sequence with a compaction thrown in the middle:
> >
> >   1.  ins  val1  with TS=6 set by client
> >   2.  del  entire row at TS=6 set by client
> >   3.  ---- table is compacted -----
> >   4.  ins  val2  with TS=6  set by client
> >   5.  read row
> >
> >The row returns val2.  (the delete at step2 got lost due to compaction).
> >
> >So we have different results depending upon whether an internal
> >re-organization (like a compaction) happened or not. If we want both
> >sequences to behave exactly the same, then we need to first choose what is
> >the proper (and deterministic) behavior.
> >
> >A.  if we think that the first sequence is the correct one, then the
> >delete
> >at step 2 needs to be preserved forever.
> >
> >or,
> >
> >B. if we think that the second sequence is the correct behavior (ie, a
> >read
> >always produces the same results independent of compaction), then the
> >record needs a second "internal TS" field to allow the RS to distinguish
> >the real sequence of events, and not rely upon the TS field which is
> >settable by the client.
> >
> >My opinion:
> >
> >We should do B.  It is normal for someone to write code that says  "if old
> >exists, delete it;  add new". A subsequent read should always reliably
> >return "new".
> >
> >The current way of relying on a client-settable TS field to determine
> >causal order results in quirky behavior, and quirky is not good.
> >
> >
> >
> >> Look at these two examples:
> >>
> >> 1. insert Val1  at real time t1
> >> 2. <del>  at real time t2 > t1
> >> 3. insert  Val2 at real time  t3 > t2
> >>
> >> 1. insert Val1  with TS=1 at real time t1
> >> 2. <del>  with TS = 2 at real time t2 > t1
> >>
> >> 3. insert  Val2 with TS = 3 at real time  t3 > t2
> >>
> >>
> >> In both cases Val2 is visible.
> >>
> >> If the your code sets your own timestamps, you better know what you're
> >> doing :)
> >>
> >> Note that my examples below are confusing even if you know how deletion
> >>in
> >> HBase works.
> >> You have to look at Delete.java to figure out what is happening.
> >> OK, since there were know objections in two days, I will commit my
> >> proposed change in HBASE-5205.
> >>
> >>
> >> -- Lars
> >>
> >> ________________________________
> >> From: M. C. Srivas <[email protected]>
> >> To: [email protected]; lars hofhansl <[email protected]>
> >> Sent: Tuesday, January 17, 2012 8:13 AM
> >> Subject: Re: Delete client API.
> >>
> >>
> >> Delete seems to be confusing in general. Here are some examples that
> >>make
> >> me scratch my head (key is same in all the examples):
> >>
> >> Example1:
> >> ----------------
> >> 1. insert Val3  with TS=3  at real time t1
> >> 2. insert Val5  with TS=5  at real time t2 > t1
> >> 3. <del>    at real time t3 > t2
> >> 4. insert  Val6  with TS=6  at real time  t4 > t3
> >>
> >> What does a read return?  (I would expect  Val6, since it was done
> >>last).
> >> But depending upon whether compaction happened or not between steps 3
> >>and
> >> 4, I get either Val6 or  nothing.
> >>
> >> Example 2:
> >> -----------------
> >> 1. insert Val3  with TS=3  at real time t1
> >> 2. insert Val5  with TS=5  at real time t2 > t1
> >> 3. <del>  TS=6  at real time t3 > t2
> >> 4. insert  Val6  with TS=6  at real time  t4 > t3
> >>
> >> Note the difference in step 3 is this time a TS was specified by the
> >> client.
> >>
> >> What does a read return?  Again, I expect Val6 to be returned. But
> >> depending upon what's going on, I seem to get either Val5 or Val6.
> >>
> >>
> >>
> >>
> >>
> >> On Sun, Jan 15, 2012 at 7:21 PM, lars hofhansl <[email protected]>
> >> wrote:
> >>
> >> There are some confusing parts about the Delete client API:
> >> >1. calling deleteFamily removes all prior column or columns markers
> >> without checking the TS.
> >> >2. delete{Column|Columns|Family} do not use the timestamp passed to
> >> Delete at construction time, but instead default to LATEST_TIMESTAMP.
> >> >
> >> >  Delete d = new Delete(R,T);
> >> >  d.deleteFamily(CF);
> >> >
> >> >Does not do what you expect (won't use T for the family delete, but
> >> rather the current time).
> >> >
> >> >Neither does
> >> >  d.deleteColumns(CF, C1, T2);
> >> >  d.deleteFamily(CF, T1); // T1 < T2
> >> >
> >> >
> >> >(the columns marker will be removed)
> >> >
> >> >
> >> >#1 prevents Delete from adding a family marker F for time T1 and a
> >> column/columns marker for columns of F at T2 even if T2 > T1.
> >> >#2 is just unexpected and different from what Put is doing.
> >> >
> >> >In HBASE-5205 I propose a simple patch to fix this.
> >> >
> >> >Since this is a (slight) API change, please provide feed back.
> >> >
> >> >Thanks.
> >> >
> >> >-- Lars
> >> >
> >> >
> >>
>

Reply via email to