[ 
https://issues.apache.org/jira/browse/HBASE-6226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13399550#comment-13399550
 ] 

Matt Corgan commented on HBASE-6226:
------------------------------------

Thanks for looking over it Mikhail.  I've been looking at it from a few 
different angles:

Testing:
DataBlockEncoding is essentially a "codec" between KeyValues and byte[]'s.  
There is no threading, locking, IO, timeouts, retries, etc.  There shouldn't 
even be exceptions.  It would be nice to isolate the codec logic to minimize 
it's dependencies and to "harden" it as much as possible through pure unit 
tests, rather than heavyweight integration tests.  We should be able to prove 
the correctness of the codecs and their edge cases without usage of a 
minicluster, etc.  We will still need higher-level tests using the minicluster, 
but the low level tests should be extremely focused on the codec details.  For 
example, I think some of the DeltaEncoders have some pretty complex and fragile 
logic that we could test a little more thoroughly if the testing environment 
were simpler.

Maintenance:
It would be good to eventually separate the codecs like this into a separate 
module than hbase-server.  It will isolate the code and tests so developers 
know that danger lurks inside those modules, and committers can keep a sharp 
eye out for any changes that affect that module.  I'm aiming to make a module 
called hbase-prefix-tree that can hold the pluggable implementation classes for 
the implementation, and it may grow to support trie-encoded block indexes and 
memstores.  That some of the other current HFileBlock codecs go into an 
hbase-codec module even though they don't formally exist yet.

Client:
I haven't brought this up because I don't want to be too starry-eyed, but I 
think it is actually appropriate for the client to make use of the 
DataBlockEncoders to encode KeyValues on-the-fly over the wire.  It brings the 
same cost/benefit trade-off as encoding for disk space or block cache space.  
An even more advanced feature would be to pass entire data blocks over the wire 
for certain use cases (primarily unfiltered scans), and let the client decode 
them, saving a ton of server cpu.  Others have mentioned re-writing the client 
from scratch for various reasons, and I would love to see these encodings built 
in from the start.

Dependency graph:
Stack, Jesse, and I did some brainstorming on the path to modularization, and I 
suggested the separation of the codecs from the hbase-server module.  There's a 
diagram on HBASE-5977.  We would try to extract and encapsulate the logic for 
decoding what's inside each type of HFileBlock, creating a class hierarchy of 
HFileBlock implementations that implement interfaces like BloomChunkBlock.  
From the perspective of hbase-server, everything behind those interfaces is a 
perfectly tested, high-performance black box.

I'm actually proposing that hbase-server cannot see into hbase-codec and that 
hbase-codec cannot see into hbase-server.  They both would see into 
hbase-common where we store the DataBlockEncoding enum and the DataBlockEncoder 
and EncodedSeeker interfaces.  In this case the DataBlockEncoding enum goes 
into the hbase-common module and contains strings pointing to the 
implementation classes, and the implementations are instantiated via 
reflection.  This also sets up a simple framework for additional codecs 
(possibly highly customized to a particular use case) to be developed with 
minimal effect on hbase-server.


Anyway, hopefully that makes sense.  All in all, I see hbase-common as the 
"kernel" of the project, not simply a code-gateway between client and server.  
It should contain the core interfaces and classes that are fundamental to the 
concept of hbase with the assumption that they are simple enough to be 
thoroughly tested via unit tests.
                
> move DataBlockEncoding and related classes to hbase-common module
> -----------------------------------------------------------------
>
>                 Key: HBASE-6226
>                 URL: https://issues.apache.org/jira/browse/HBASE-6226
>             Project: HBase
>          Issue Type: Improvement
>          Components: io, regionserver
>    Affects Versions: 0.96.0
>            Reporter: Matt Corgan
>            Assignee: Matt Corgan
>         Attachments: HBASE-6226-v1.patch
>
>
> In order to isolate the implementation details of HBASE-4676 (PrefixTrie 
> encoding) and other DataBlockEncoders by putting them in modules, this pulls 
> up the DataBlockEncoding related interfaces into hbase-common.
> No tests are moved in this patch.  The only notable change was trimming a few 
> dependencies on HFileBlock which adds dependencies to much of the 
> regionserver.
> The test suite passes locally for me.
> I tried to keep it as simple as possible... let me know if there are any 
> concerns.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to