The cost of serialization is non trivial and a substantial expense in
conveying information from regionserver -> client.  I did some
timings, and sending data across the wire is surprisingly slow, but
attempting to compress it with various compression systems ended up
taking 50-100ms on average case (1-5mb Result[] sets).

Originally when conceptualizing thrift, the thought was to just send
the KeyValue byte[] over thrift as an opaque blob and not doing a
whole structure thing, eg: no KeyValue structure with parts for each
of the parts of a KeyValue.  On large results that cost becomes
prohibitive.

While HTTP has a high overhead of headers, if one wanted to be
http-oriented you could do: http://www.chromium.org/spdy

The nice thing is that HTTP has a good set of interops and the like.
The bad thing is it is too verbose.

-ryan

On Tue, May 31, 2011 at 1:22 PM, Stack <[email protected]> wrote:
> On Mon, May 30, 2011 at 9:55 PM, Eric Yang <[email protected]> wrote:
>> Maven modulation could be enhanced to have a structure looks like this:
>>
>> Super POM
>>  +- common
>>  +- shell
>>  +- master
>>  +- region-server
>>  +- coprocessor
>>
>> The software is basically group by processor type (role of the process) and 
>> a shared library.
>>
>
> I'd change the list above.  shell should be client and perhaps master
> and regionserver should be both inside a single 'server' submodule.
> We need to add security in there.  Perhaps we'd have a submodule for
> thrift, avro, rest (and perhaps rest war file)?  (Is this too many
> submodules  -- I suppose once we are submodularized, adding new ones
> is trivial.  Its the initial move to submodules that is painful)
>
>
>> For RPC, there are several feasible options, avro, thrift and jackson+jersey 
>> (REST).  Avro may seems cumbersome to define the schema in JSON string.  
>> Thrift comes with it's own rpc server, it is not trivial to add 
>> authorization and authentication to secure the rpc transport.  
>> Jackson+Jersey RPC message is biggest message size compare to Avro and 
>> thrift.  All three frameworks have pros and cons but I think Jackson+jersey 
>> have the right balance for rpc framework.  In most of the use case, 
>> pluggable RPC can be narrow down to two main category of use cases:
>>
>> 1. Freedom of creating most efficient rpc but hard to integrate with 
>> everything else because it's custom made.
>> 2. Being able to evolve message passing and versioning.
>>
>> If we can see beyond first reason, and realize second reason is in part 
>> polymorphic serialization.  This means, Jackson+Jersey is probably the 
>> better choice as a RPC framework because Jackson supports polymorphic 
>> serialization, and Jersey builds on HTTP protocol.  It would be easier to 
>> versioning and add security on top of existing standards.  The syntax and 
>> feature set seems more engineering proper to me.
>>
>
> I always considered http attactive but much too heavy-weight for hbase
> rpc; each request/response would carry a bunch of what are for the
> most part extraneous headers.  I suppose we should just measure.
> Regards JSON messages, thats interesting but hbase is all about binary
> data.  Does jackson/jersey do BSON?
>
> St.Ack
>

Reply via email to