All,

Thank you for giving lots of opinions and information. I'll try to
persuade my colleagues as follows:

I couldn't find any good examples where versioning should be
definitely utilized. However, HBase community members gave me the idea
on how versioning is useful.
1. Recover data lost by accidental deletions or updates
   (I think this is the most persuading reason)
2. Auditing (change tracking) )for compliance
   However, this is not persuading, because advanced RDBMSs provide
audit trails, not versioning. Versioning itself does not show who
changed the data how.
3. Recording events (as in Google's persolalized search)
   This is not persuading, too. As I wrote in the previous mail,
embedding time of event in row key may be better because it prevent
the rows from becoming big.

If versioning is not necessary from your requirement, you can ignore
timestamps (do not have to specify timestamp in API call).
Although HBase keeps three versions by default and it may be a bit
wasteful for memory and disk, turning on compression for column
families can minimize the waste as much as you can ignore (is it
true?).
If saving memory (=keep memtable as small as possible) is important,
you can set the maximum number of versions to 1.
The reason that the default is 3 is to rescue users from their
mistakes.
(If users accidentally delete or update data, you have to develop a
tool that pulls previous data records.)

Regards
Takayuki






Reply via email to