[
https://issues.apache.org/jira/browse/HBASE-6226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13399550#comment-13399550
]
Matt Corgan commented on HBASE-6226:
------------------------------------
Thanks for looking over it Mikhail. I've been looking at it from a few
different angles:
Testing:
DataBlockEncoding is essentially a "codec" between KeyValues and byte[]'s.
There is no threading, locking, IO, timeouts, retries, etc. There shouldn't
even be exceptions. It would be nice to isolate the codec logic to minimize
it's dependencies and to "harden" it as much as possible through pure unit
tests, rather than heavyweight integration tests. We should be able to prove
the correctness of the codecs and their edge cases without usage of a
minicluster, etc. We will still need higher-level tests using the minicluster,
but the low level tests should be extremely focused on the codec details. For
example, I think some of the DeltaEncoders have some pretty complex and fragile
logic that we could test a little more thoroughly if the testing environment
were simpler.
Maintenance:
It would be good to eventually separate the codecs like this into a separate
module than hbase-server. It will isolate the code and tests so developers
know that danger lurks inside those modules, and committers can keep a sharp
eye out for any changes that affect that module. I'm aiming to make a module
called hbase-prefix-tree that can hold the pluggable implementation classes for
the implementation, and it may grow to support trie-encoded block indexes and
memstores. That some of the other current HFileBlock codecs go into an
hbase-codec module even though they don't formally exist yet.
Client:
I haven't brought this up because I don't want to be too starry-eyed, but I
think it is actually appropriate for the client to make use of the
DataBlockEncoders to encode KeyValues on-the-fly over the wire. It brings the
same cost/benefit trade-off as encoding for disk space or block cache space.
An even more advanced feature would be to pass entire data blocks over the wire
for certain use cases (primarily unfiltered scans), and let the client decode
them, saving a ton of server cpu. Others have mentioned re-writing the client
from scratch for various reasons, and I would love to see these encodings built
in from the start.
Dependency graph:
Stack, Jesse, and I did some brainstorming on the path to modularization, and I
suggested the separation of the codecs from the hbase-server module. There's a
diagram on HBASE-5977. We would try to extract and encapsulate the logic for
decoding what's inside each type of HFileBlock, creating a class hierarchy of
HFileBlock implementations that implement interfaces like BloomChunkBlock.
From the perspective of hbase-server, everything behind those interfaces is a
perfectly tested, high-performance black box.
I'm actually proposing that hbase-server cannot see into hbase-codec and that
hbase-codec cannot see into hbase-server. They both would see into
hbase-common where we store the DataBlockEncoding enum and the DataBlockEncoder
and EncodedSeeker interfaces. In this case the DataBlockEncoding enum goes
into the hbase-common module and contains strings pointing to the
implementation classes, and the implementations are instantiated via
reflection. This also sets up a simple framework for additional codecs
(possibly highly customized to a particular use case) to be developed with
minimal effect on hbase-server.
Anyway, hopefully that makes sense. All in all, I see hbase-common as the
"kernel" of the project, not simply a code-gateway between client and server.
It should contain the core interfaces and classes that are fundamental to the
concept of hbase with the assumption that they are simple enough to be
thoroughly tested via unit tests.
> move DataBlockEncoding and related classes to hbase-common module
> -----------------------------------------------------------------
>
> Key: HBASE-6226
> URL: https://issues.apache.org/jira/browse/HBASE-6226
> Project: HBase
> Issue Type: Improvement
> Components: io, regionserver
> Affects Versions: 0.96.0
> Reporter: Matt Corgan
> Assignee: Matt Corgan
> Attachments: HBASE-6226-v1.patch
>
>
> In order to isolate the implementation details of HBASE-4676 (PrefixTrie
> encoding) and other DataBlockEncoders by putting them in modules, this pulls
> up the DataBlockEncoding related interfaces into hbase-common.
> No tests are moved in this patch. The only notable change was trimming a few
> dependencies on HFileBlock which adds dependencies to much of the
> regionserver.
> The test suite passes locally for me.
> I tried to keep it as simple as possible... let me know if there are any
> concerns.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira