When i started writing i was not aware of zmq yet, so the connection
layer is just written using boost, but it would be quite simple to
replace that with zmq, just have not gotten around to it yet.
> i really don't see the need for people reinventing the low level rcp
stuff over and over again.
I totally agree! The thing I like about the protobufs is however that
you can directly write your service definition in the .proto file which
gets parsed for you by the protocol buffer compiler and you can access
the parsed definitions via protoc plugins for example. With the plugins
+ protoc compiler nicely integrated into a cross language build system,
it's not minimal effort to have clients in different languages, it's
zero effort. Also it takes a little bit of code if you want to have
multiple services on the same port, being able to list all services and
get service definitions, etc... Basically i have just taken care of that
part.
> for rcp the comparison of speed is a bit of a moot point, since all
the latency will be in the communication, not so much in the
serialization, i suspect.
Yes the communication of course adds latency but i am talking about
throughput here. If you believe comparison of speed is a bit of a moot
point, then you must be one of the fortunate people that never had to
use SOAP ;) also you need to properly hide the latency and also make
sure you minimize copying data around in memory etc... But I am certain
zeromq does a much better job there than my boost implementation ;)
haven't benchmarked it yet though.
> but once you communicate using protobuf it also becomes really
tempting to store in hadoop using protobuf instead of
writables/sequencefiles, and from what i have heard (i have not tested
this myself) it is a good deal slower in that situation.
What do you mean of just using protobuf instead of
writables/sequencefiles exactly? I.e. let's assume you just use some
ProtobufToWritable adapter, i don't see how that would be much slower
than using writables, writables and protobufs really just do the same
job, do they not? Except that protobufs are available in other
languages, are defined via the proto language etc... If you use
writables or protobufs, you most likely can serialize faster than you
can write to disk or to network. At least that is my feeling so far from
using protobufs to store stuff in hbase or raw hfiles, but i have to
admit, i have not properly benchmarked this. What kind of fileformat
would you use to write serialized protobufs to, that would make it so
slow? I guess in the end, one just needs to benchmark everything :)
TL;DR
.proto + protocol buffer plugins for generating rpc clients and servers
is really handy. If writables or protobufs are faster needs to be
benchmarked, but probably both serialize faster than one can write.
On 23.09.2011 15:40, Koert Kuipers wrote:
did you build it on top of zmq? i really don't see the need for people
reinventing the low level rcp stuff over and over again. zmq comes
with baked in request-response, pub-sub, and pipeline (distributed
processing) communication. once you rely on protobuf + zmq for the rpc
is it trivial to add clients in other languages, i had java, R and
python talking to each other with minimal effort.
for rcp the comparison of speed is a bit of a moot point, since all
the latency will be in the communication, not so much in the
serialization, i suspect. but once you communicate using protobuf it
also becomes really tempting to store in hadoop using protobuf instead
of writables/sequencefiles, and from what i have heard (i have not
tested this myself) it is a good deal slower in that situation.
On Fri, Sep 23, 2011 at 8:14 AM, Stephan Gammeter
<gamme...@vision.ee.ethz.ch <mailto:gamme...@vision.ee.ethz.ch>> wrote:
I don't think protobuf are slower than writable actually, they do
really well in speed. I actually wrote some rpc code in C++ for
protocolbuffers and some swig wrappers to have clients in java. A
simple c++ server can easily handle about 20k qps in that setup
and this is just with a naive implementation where still some
excess data copies happen during the processing of requests. If i
have time i would like to opensource it, but i would need some
help to get it running properly in other languages, so that it can
be truly cross language. (right now servers are only supported in
c++, clients are synchronous and asynchronous in c++, in java only
synchronous clients are supported)
On 21.09.2011 22 <tel:21.09.2011%2022>:59, Koert Kuipers wrote:
i would love an IDL, plus that modern serialization frameworks
such as protobuf/thrift support versioning (although i still have
issues with different versions of thrift not working nicely
together, argh why is that). the only downside is perhaps that
they are a little slower than writables.
On Wed, Sep 21, 2011 at 3:12 AM, Uma Maheswara Rao G 72686
<mahesw...@huawei.com <mailto:mahesw...@huawei.com>> wrote:
Hadoop has its RPC machanism mainly Writables to overcome
some of the disadvantages on normal serializations.
For more info:
http://www.lexemetech.com/2008/07/rpc-and-serialization-with-hadoop.html
Regards,
Uma
----- Original Message -----
From: jie_zhou <jie_z...@xa.allyes.com
<mailto:jie_z...@xa.allyes.com>>
Date: Wednesday, September 21, 2011 12:12 pm
Subject: A question about RPC
To: hdfs-user@hadoop.apache.org
<mailto:hdfs-user@hadoop.apache.org>
> Dear:
>
> Nice to meet you!
>
> I am a beginner of hadoop. Recently, I have seen the source
of RPC of
> hadoop,but now I have a question. As we know,hadoop RPC
make use
> of Dynamic
> proxy mechanism ,but
>
> why not use IDL such as CORBA, or AIDL of Android?
>
> Thanks for your early reply.
>
> Best Regards,
>
> jie
>
>
>
>
>
>
>
>
>
>