[ 
https://issues.apache.org/jira/browse/HBASE-1249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12680214#action_12680214
 ] 

stack commented on HBASE-1249:
------------------------------

Thanks for opening this Jon.

I'm currently working on changing the key format, HBASE-1234, as part of a 
regionserver rewrite that does away with HStoreKey replacing it with a new 
org.apache.hadoop.hbase.regionserver.KeyValue data structure that lives inside 
a ByteBuffer.  The new key format is described in HBASE-1234 and its latest 
manifestation can be found in the github repositiory here: 
http://github.com/ryanobjc/hbase/blob/5ed35fb55bd4ba2404ecbc94c6c45d7c8a7162e4/src/java/org/apache/hadoop/hbase/regionserver/KeyValue.java

Here is from the class comment:

{code}
* Utility for making, comparing and fetching pieces of a hbase KeyValue blob.
* Blob format is: <keylength> <valuelength> <key> <value>
* Key is decomposed as: <rowlength> <row> <columnfamilylength> <columnfamily> 
<columnqualifier> <timestamp> <keytype>
* Rowlength maximum is Short.MAX_SIZE, column family length maximum is
* Byte.MAX_SIZE, and column qualifier + value length must be < Integer.MAX_SIZE.
* The column does not contain the family/qualifier delimiter.
{code}

Here are some notes on what I've learned as part of the rewrite:

+ Turns out we were doing a bunch of expensive column matching lookup 
operations -- 10%+ of all CPU in recent seek+scan 1000 rows test -- that were 
not necessary at all.  The column match was being done in a store/family 
context so a bunch of the column family parse and fetching from maps of column 
matchers to find what to use in a particular column context were not needed.
+ How deletes work will have to be redone now we have a richer delete 
vocabulary.  What was there previous was ugly anyways so no harm in a rewrite 
except for the work debugging new implementation.
+ We need to make the ByteBuffer that holds the KV that comes out hfile 
read-only
+ Will need to redo memcache size calculations (need Ryan and Erik help here).



> Rearchitecting of server, client, API, key format, etc for 0.20
> ---------------------------------------------------------------
>
>                 Key: HBASE-1249
>                 URL: https://issues.apache.org/jira/browse/HBASE-1249
>             Project: Hadoop HBase
>          Issue Type: Improvement
>            Reporter: Jonathan Gray
>            Priority: Blocker
>             Fix For: 0.20.0
>
>
> To discuss all the new and potential issues coming out of the change in key 
> format (HBASE-1234): zero-copy reads, client binary protocol, update of API 
> (HBASE-880), server optimizations, etc...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to