[
https://issues.apache.org/jira/browse/HBASE-5993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13285764#comment-13285764
]
Jacques commented on HBASE-5993:
--------------------------------
The reason this can make sense is data overhead. In a case where we are
capturing a large number of small values, the KeyValue overhead is substantial.
The original use case is one where I'm adding to a list of documents that
contain a certain term (search index). Let's say that each document number is
a four byte int. Right now there are two options: use the existing append
which means one will become swamped with reads as the cell value grows over
time (this would also wreak havoc on memstore flushes as the cell value become
megabytes in size and we're just adding another four bytes once a day). On the
flipside, using separate columns creates a substantial amount of overhead for
each value added. This utility of this functionality also extends to
situations where people are capturing a large sequence of small values in
monitoring applications. (Sematext are basically trying to create this
functionality already with their HBaseHUT work.)
Yes, an additional KeyValue.Type is needed. When this type is read, the return
functionality goes and get all the appended values (and the last full value)
and then combines them on return. As compactions are done, the complete merged
values are created.
I'm swamped right now but am going to try to write up a short design doc in the
next couple of weeks and get you guys to review my approach since this will
have to touch a number of components. I also need to make sure to manage edge
cases like what happens if you do a no-read append and no existing value exists
(probably ok--even though read back performance will be poor).
> Add a no-read Append
> --------------------
>
> Key: HBASE-5993
> URL: https://issues.apache.org/jira/browse/HBASE-5993
> Project: HBase
> Issue Type: Improvement
> Components: regionserver
> Affects Versions: 0.94.0
> Reporter: Jacques
> Priority: Critical
>
> HBASE-4102 added an atomic append. For high performance situations, it would
> be helpful to be able to do appends that don't actually require a read of the
> existing value. This would be useful in building a growing set of values.
> Our original use case was for implementing a form of search in HBase where a
> cell would contain a list of document ids associated with a particular
> keyword for search. However it seems like it would also be useful to provide
> substantial performance improvements for most Append scenarios.
> Within the client API, the simplest way to implement this would be to
> leverage the existing Append api. If the Append is marked as
> setReturnResults(false), use this code path. If result return is requested,
> use the existing Append implementation.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira