[
https://issues.apache.org/jira/browse/HBASE-1249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12680214#action_12680214
]
stack commented on HBASE-1249:
------------------------------
Thanks for opening this Jon.
I'm currently working on changing the key format, HBASE-1234, as part of a
regionserver rewrite that does away with HStoreKey replacing it with a new
org.apache.hadoop.hbase.regionserver.KeyValue data structure that lives inside
a ByteBuffer. The new key format is described in HBASE-1234 and its latest
manifestation can be found in the github repositiory here:
http://github.com/ryanobjc/hbase/blob/5ed35fb55bd4ba2404ecbc94c6c45d7c8a7162e4/src/java/org/apache/hadoop/hbase/regionserver/KeyValue.java
Here is from the class comment:
{code}
* Utility for making, comparing and fetching pieces of a hbase KeyValue blob.
* Blob format is: <keylength> <valuelength> <key> <value>
* Key is decomposed as: <rowlength> <row> <columnfamilylength> <columnfamily>
<columnqualifier> <timestamp> <keytype>
* Rowlength maximum is Short.MAX_SIZE, column family length maximum is
* Byte.MAX_SIZE, and column qualifier + value length must be < Integer.MAX_SIZE.
* The column does not contain the family/qualifier delimiter.
{code}
Here are some notes on what I've learned as part of the rewrite:
+ Turns out we were doing a bunch of expensive column matching lookup
operations -- 10%+ of all CPU in recent seek+scan 1000 rows test -- that were
not necessary at all. The column match was being done in a store/family
context so a bunch of the column family parse and fetching from maps of column
matchers to find what to use in a particular column context were not needed.
+ How deletes work will have to be redone now we have a richer delete
vocabulary. What was there previous was ugly anyways so no harm in a rewrite
except for the work debugging new implementation.
+ We need to make the ByteBuffer that holds the KV that comes out hfile
read-only
+ Will need to redo memcache size calculations (need Ryan and Erik help here).
> Rearchitecting of server, client, API, key format, etc for 0.20
> ---------------------------------------------------------------
>
> Key: HBASE-1249
> URL: https://issues.apache.org/jira/browse/HBASE-1249
> Project: Hadoop HBase
> Issue Type: Improvement
> Reporter: Jonathan Gray
> Priority: Blocker
> Fix For: 0.20.0
>
>
> To discuss all the new and potential issues coming out of the change in key
> format (HBASE-1234): zero-copy reads, client binary protocol, update of API
> (HBASE-880), server optimizations, etc...
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.