[ 
https://issues.apache.org/jira/browse/HBASE-5993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13285764#comment-13285764
 ] 

Jacques commented on HBASE-5993:
--------------------------------

The reason this can make sense is data overhead.  In a case where we are 
capturing a large number of small values, the KeyValue overhead is substantial. 
 The original use case is one where I'm adding to a list of documents that 
contain a certain term (search index).  Let's say that each document number is 
a four byte int.  Right now there are two options: use the existing append 
which means one will become swamped with reads as the cell value grows over 
time (this would also wreak havoc on memstore flushes as the cell value become 
megabytes in size and we're just adding another four bytes once a day).  On the 
flipside, using separate columns creates a substantial amount of overhead for 
each value added.  This utility of this functionality also extends to 
situations where people are capturing a large sequence of small values in 
monitoring applications.  (Sematext are basically trying to create this 
functionality already with their HBaseHUT work.)  

Yes, an additional KeyValue.Type is needed.  When this type is read, the return 
functionality goes and get all the appended values (and the last full value) 
and then combines them on return.  As compactions are done, the complete merged 
values are created.  

I'm swamped right now but am going to try to write up a short design doc in the 
next couple of weeks and get you guys to review my approach since this will 
have to touch a number of components.  I also need to make sure to manage edge 
cases like what happens if you do a no-read append and no existing value exists 
(probably ok--even though read back performance will be poor).  


                
> Add a no-read Append
> --------------------
>
>                 Key: HBASE-5993
>                 URL: https://issues.apache.org/jira/browse/HBASE-5993
>             Project: HBase
>          Issue Type: Improvement
>          Components: regionserver
>    Affects Versions: 0.94.0
>            Reporter: Jacques
>            Priority: Critical
>
> HBASE-4102 added an atomic append.  For high performance situations, it would 
> be helpful to be able to do appends that don't actually require a read of the 
> existing value.  This would be useful in building a growing set of values.  
> Our original use case was for implementing a form of search in HBase where a 
> cell would contain a list of document ids associated with a particular 
> keyword for search.  However it seems like it would also be useful to provide 
> substantial performance improvements for most Append scenarios.
> Within the client API, the simplest way to implement this would be to 
> leverage the existing Append api.  If the Append is marked as 
> setReturnResults(false), use this code path.  If result return is requested, 
> use the existing Append implementation.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to