[ 
https://issues.apache.org/jira/browse/HBASE-2265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12838251#action_12838251
 ] 

ryan rawson commented on HBASE-2265:
------------------------------------

I'm not sure this will help make gets better, there are 2 get cases:

- get a single column for a row.  In this case, if timestamps are written out 
of order, we dont know which hfile to start with.  Lets say we start with the 
'newest' one, and it has TS[1], well is the fact that an older file start < 
TS[1] < end mean we should consult this file?  I suppose if end < TS[1] (thus 
the timestamp gotten is newer than the keyvalue we already got), we'd know 
there is nothing newer and we could conclusively rule that file out.  If TS[1] 
was < beginning of a file, we'd have to consider the file.  With a big spread 
of timestamps and keys, we wouldnt get much of an optimization.

- for a complete column family get, we'll have to touch every file, every time. 
This is because you are never sure if the next file contains another key/value 
for the result.  A bloom filter would help here.

As for the scan, we already know which files are 'newer'.  However, during a 
compaction, this information is collapsed, and we end up with the duplicate 
key/values sitting next to each other.  We might be able to cause/create an 
invariant that during compaction the 'newer' one comes first. The compaction 
might be able to help straighten this out, since i think we do minor 
compactions 'in order', with older files first. Seems like a tricky bit. 

Generally the ideal solution would involve no change to the KeyValue 
serialization format (and hence possibly requiring a store-file rewrite).

> HFile and Memstore should maintain minimum and maximum timestamps
> -----------------------------------------------------------------
>
>                 Key: HBASE-2265
>                 URL: https://issues.apache.org/jira/browse/HBASE-2265
>             Project: Hadoop HBase
>          Issue Type: Improvement
>          Components: regionserver
>            Reporter: Todd Lipcon
>
> In order to fix HBASE-1485 and HBASE-29, it would be very helpful to have 
> HFile and Memstore track their maximum and minimum timestamps. This has the 
> following nice properties:
> - for a straight Get, if an entry has been already been found with timestamp 
> X, and X >= HFile.maxTimestamp, the HFile doesn't need to be checked. Thus, 
> the current fast behavior of get can be maintained for those who use strictly 
> increasing timestamps, but "correct" behavior for those who sometimes write 
> out-of-order.
> - for a scan, the "latest timestamp" of the storage can be used to decide 
> which cell wins, even if the timestamp of the cells is equal. In essence, 
> rather than comparing timestamps, instead you are able to compare tuples of 
> (row timestamp, storage.max_timestamp)
> - in general, min_timestamp(storage A) >= max_timestamp(storage B) if storage 
> A was flushed after storage B.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to