The cost of serialization is non trivial and a substantial expense in conveying information from regionserver -> client. I did some timings, and sending data across the wire is surprisingly slow, but attempting to compress it with various compression systems ended up taking 50-100ms on average case (1-5mb Result[] sets).
Originally when conceptualizing thrift, the thought was to just send the KeyValue byte[] over thrift as an opaque blob and not doing a whole structure thing, eg: no KeyValue structure with parts for each of the parts of a KeyValue. On large results that cost becomes prohibitive. While HTTP has a high overhead of headers, if one wanted to be http-oriented you could do: http://www.chromium.org/spdy The nice thing is that HTTP has a good set of interops and the like. The bad thing is it is too verbose. -ryan On Tue, May 31, 2011 at 1:22 PM, Stack <[email protected]> wrote: > On Mon, May 30, 2011 at 9:55 PM, Eric Yang <[email protected]> wrote: >> Maven modulation could be enhanced to have a structure looks like this: >> >> Super POM >> +- common >> +- shell >> +- master >> +- region-server >> +- coprocessor >> >> The software is basically group by processor type (role of the process) and >> a shared library. >> > > I'd change the list above. shell should be client and perhaps master > and regionserver should be both inside a single 'server' submodule. > We need to add security in there. Perhaps we'd have a submodule for > thrift, avro, rest (and perhaps rest war file)? (Is this too many > submodules -- I suppose once we are submodularized, adding new ones > is trivial. Its the initial move to submodules that is painful) > > >> For RPC, there are several feasible options, avro, thrift and jackson+jersey >> (REST). Avro may seems cumbersome to define the schema in JSON string. >> Thrift comes with it's own rpc server, it is not trivial to add >> authorization and authentication to secure the rpc transport. >> Jackson+Jersey RPC message is biggest message size compare to Avro and >> thrift. All three frameworks have pros and cons but I think Jackson+jersey >> have the right balance for rpc framework. In most of the use case, >> pluggable RPC can be narrow down to two main category of use cases: >> >> 1. Freedom of creating most efficient rpc but hard to integrate with >> everything else because it's custom made. >> 2. Being able to evolve message passing and versioning. >> >> If we can see beyond first reason, and realize second reason is in part >> polymorphic serialization. This means, Jackson+Jersey is probably the >> better choice as a RPC framework because Jackson supports polymorphic >> serialization, and Jersey builds on HTTP protocol. It would be easier to >> versioning and add security on top of existing standards. The syntax and >> feature set seems more engineering proper to me. >> > > I always considered http attactive but much too heavy-weight for hbase > rpc; each request/response would carry a bunch of what are for the > most part extraneous headers. I suppose we should just measure. > Regards JSON messages, thats interesting but hbase is all about binary > data. Does jackson/jersey do BSON? > > St.Ack >
