[jira] Commented: (HBASE-2265) HFile and Memstore should maintain minimum and maximum timestamps

Todd Lipcon (JIRA) Thu, 25 Feb 2010 07:44:49 -0800

    [ 
https://issues.apache.org/jira/browse/HBASE-2265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12838382#action_12838382
 ]


Todd Lipcon commented on HBASE-2265:
------------------------------------

bq. With a big spread of timestamps and keys, we wouldnt get much of an 
optimization

Exactly. If users are writing out of order, they cannot take advantage of the 
optimization of culling older storage. As you mentioned, bloom filters help 
here. For users who are writing in order, the performance should be identical 
today. I think this is exactly what we want.

bq. for a complete column family get, we'll have to touch every file, every 
time. This is because you are never sure if the next file contains another 
key/value for the result. A bloom filter would help here

Yep, and this is exactly what I would expect. Why should a column family get 
_not_ touch all of the files?

bq. However, during a compaction, this information is collapsed, and we end up 
with the duplicate key/values sitting next to each other. We might be able to 
cause/create an invariant that during compaction the 'newer' one comes first

It's probably worth getting consensus, but I think it would be acceptable 
behavior to only retain the keyval from the newest storage when the timestamps 
are equal. That is, if I write A:ts=1, B:ts=2, C:ts=3, D:ts=3, E:ts=3, and want 
to retain "latest 3", I'd end up getting writes A, B, and E.

bq. Generally the ideal solution would involve no change to the KeyValue 
serialization format

I agree, and I think this can be done using only the existing metadata fields 
without any change per-keyvalue.

> HFile and Memstore should maintain minimum and maximum timestamps
> -----------------------------------------------------------------
>
>                 Key: HBASE-2265
>                 URL: https://issues.apache.org/jira/browse/HBASE-2265
>             Project: Hadoop HBase
>          Issue Type: Improvement
>          Components: regionserver
>            Reporter: Todd Lipcon
>
> In order to fix HBASE-1485 and HBASE-29, it would be very helpful to have 
> HFile and Memstore track their maximum and minimum timestamps. This has the 
> following nice properties:
> - for a straight Get, if an entry has been already been found with timestamp 
> X, and X >= HFile.maxTimestamp, the HFile doesn't need to be checked. Thus, 
> the current fast behavior of get can be maintained for those who use strictly 
> increasing timestamps, but "correct" behavior for those who sometimes write 
> out-of-order.
> - for a scan, the "latest timestamp" of the storage can be used to decide 
> which cell wins, even if the timestamp of the cells is equal. In essence, 
> rather than comparing timestamps, instead you are able to compare tuples of 
> (row timestamp, storage.max_timestamp)
> - in general, min_timestamp(storage A) >= max_timestamp(storage B) if storage 
> A was flushed after storage B.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-2265) HFile and Memstore should maintain minimum and maximum timestamps

Reply via email to