[
https://issues.apache.org/jira/browse/HBASE-1249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12693609#action_12693609
]
Jonathan Gray commented on HBASE-1249:
--------------------------------------
Ah, so not setting it and performing deletes on the memcache means reading a
deletefamily means everything prior storefiles is deleted for that row.
I guess I just don't agree with that kind of selective restrictions for
performance unless we're going to make a conscious and logical design decision.
There's a very clear and logical argument for disallowing the manual setting
of timestamps. However, this ability is part of the BigTable spec and there
are numerous use cases for this (including pset). It closes the door for
potential optimizations for those of us who have no need to manually set them,
but it's not terrible to allow it as long as they're only in the past.
The same argument can be applied to this and a bunch of other issues we've been
tossing back and forth.
Let's not make these kinds of decisions without deciding what our requirements
are. Either timestamp is a user-settable attribute, or it isn't. I think it
should be. Part of the issues with the current API is you can do certain
things in one part of the API that aren't supported in the other type.
Scanning and versions don't play nice even though we logically can support it.
There shouldn't be caveats like, you can insert at any time in the past, but if
you want to delete a row, you can only delete every version or particular
versions of particular columns, not all versions older than a specified stamp.
Erik's digging has shown numerous potential optimizations for the future, very
good stuff. BUT Let's not alter our requirements or the properties of HBase in
significant ways in the name of minor optimization of edge cases.
If I understand correctly, even with #2 if you do a deleteFamily and specify
NOW, it would have the same early-out possibility as with #1. I see a
DeleteFamily with a stamp that is newer than the latest stamp in the next
storefile. I know all columns are deleted so I do nothing. Enforcing the
deletes in memcache means you tuck it away untli the next storefile anyways.
So implementation is identical with #2 if used in the way #1 forces you to.
But you remove the ability of the user to put a past stamp in. And this just
adds additional caveats instead of keeping it simple. If a user does a
deletefamily with a past stamp, then read queries would need to open additional
stores. That's required for correctness of the query, this is not an
inefficiency this is what the user wants to happen if he uses puts and deletes
in this way.
> Rearchitecting of server, client, API, key format, etc for 0.20
> ---------------------------------------------------------------
>
> Key: HBASE-1249
> URL: https://issues.apache.org/jira/browse/HBASE-1249
> Project: Hadoop HBase
> Issue Type: Improvement
> Reporter: Jonathan Gray
> Priority: Blocker
> Fix For: 0.20.0
>
> Attachments: HBASE-1249-Example-v1.pdf, HBASE-1249-Example-v2.pdf,
> HBASE-1249-GetQuery-v1.pdf, HBASE-1249-GetQuery-v2.pdf,
> HBASE-1249-GetQuery-v3.pdf, HBASE-1249-StoreFile-v1.pdf
>
>
> To discuss all the new and potential issues coming out of the change in key
> format (HBASE-1234): zero-copy reads, client binary protocol, update of API
> (HBASE-880), server optimizations, etc...
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.