[
https://issues.apache.org/jira/browse/HBASE-1249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12682935#action_12682935
]
Jonathan Gray commented on HBASE-1249:
--------------------------------------
We need to do some testing on that. Scanning through the deletes in the
memcache might be pretty fast, regardless. However I think it sounds like a
good idea and the basis for some more thoughts.
And yeah, there should probably be no such thing as a DeleteRow on the server.
And this is especially the case with locality groups as you'd need to seek to
the start of the row every time before seeking down to your family.
But in thinking more about memcache deletes... when we flush the memcache, we
can guarantee that none of the values being flushed have been deleted (if we do
as above, applying deletes to the memcache). So we have a list of deletes that
apply to older store files. Then we start a new memcache.
When we read in the newest storefile, we actually know that we can process it
without looking at any deletes except those that are in the new memcache. The
deletes in this storefile aren't needed until the second newest is looked at.
And at that point we can read them in in bulk from the previous storefile
that's already been opened. Can even compare stamps from the deletes to the
storefile stamps to possible query stamps to early out. This is a far cry from
how things are now... deletes are interspersed and duplicated everywhere.
It does seem to make sense to have the deletes order above where they apply,
but then we have to check those sections first before reading? Well come to
think of it, what could make sense is to order them below. The only time we
actually have deletes in a storefile is when they need to be applied to the
older storefiles. So, we can scan these deletes at the end, once we have
reached past what we wanted (and still need to read additional storefiles) we
can scan and seek for deletes pertaining to this row/family/column, if there
are any.
Those deletes are added to the in-memory deleteset for the remaining storefiles.
Any rewriting of files must enforce deletions across them, and files must be
sequential in age if not all are combined.
So, DeleteRow and DeleteFamily would take no time parameters, and would be
stored with the time of deletion. Their KeyValue will sort at the end of the
row, meaning you need to scan to this spot any time you reach the end of what
you're reading from that store's row and need to read the next.
DeleteColumn would use now by default, or you could specify a stamp and it
would delete everything <= that stamp. This _could_ sort at the end of the
column, but is there any point? It should probably be at the end of the row,
this is where you have to seek to look for a DeleteFamily anyways.
Delete would be the same thing. Sorted at the end of the row. Just need to
get the deleteset and comparators right so they can do the matching well for
these different delete types against different cell KeyValues.
Might make sense to have a DeleteRow in this case, would be less work in the
case of locality groups. But not a big deal either way really.
> Rearchitecting of server, client, API, key format, etc for 0.20
> ---------------------------------------------------------------
>
> Key: HBASE-1249
> URL: https://issues.apache.org/jira/browse/HBASE-1249
> Project: Hadoop HBase
> Issue Type: Improvement
> Reporter: Jonathan Gray
> Priority: Blocker
> Fix For: 0.20.0
>
>
> To discuss all the new and potential issues coming out of the change in key
> format (HBASE-1234): zero-copy reads, client binary protocol, update of API
> (HBASE-880), server optimizations, etc...
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.