When i started writing i was not aware of zmq yet, so the connection layer is just written using boost, but it would be quite simple to replace that with zmq, just have not gotten around to it yet. > i really don't see the need for people reinventing the low level rcp stuff over and over again. I totally agree! The thing I like about the protobufs is however that you can directly write your service definition in the .proto file which gets parsed for you by the protocol buffer compiler and you can access the parsed definitions via protoc plugins for example. With the plugins + protoc compiler nicely integrated into a cross language build system, it's not minimal effort to have clients in different languages, it's zero effort. Also it takes a little bit of code if you want to have multiple services on the same port, being able to list all services and get service definitions, etc... Basically i have just taken care of that part.

> for rcp the comparison of speed is a bit of a moot point, since all the latency will be in the communication, not so much in the serialization, i suspect. Yes the communication of course adds latency but i am talking about throughput here. If you believe comparison of speed is a bit of a moot point, then you must be one of the fortunate people that never had to use SOAP ;) also you need to properly hide the latency and also make sure you minimize copying data around in memory etc... But I am certain zeromq does a much better job there than my boost implementation ;) haven't benchmarked it yet though.

> but once you communicate using protobuf it also becomes really tempting to store in hadoop using protobuf instead of writables/sequencefiles, and from what i have heard (i have not tested this myself) it is a good deal slower in that situation. What do you mean of just using protobuf instead of writables/sequencefiles exactly? I.e. let's assume you just use some ProtobufToWritable adapter, i don't see how that would be much slower than using writables, writables and protobufs really just do the same job, do they not? Except that protobufs are available in other languages, are defined via the proto language etc... If you use writables or protobufs, you most likely can serialize faster than you can write to disk or to network. At least that is my feeling so far from using protobufs to store stuff in hbase or raw hfiles, but i have to admit, i have not properly benchmarked this. What kind of fileformat would you use to write serialized protobufs to, that would make it so slow? I guess in the end, one just needs to benchmark everything :)

TL;DR
.proto + protocol buffer plugins for generating rpc clients and servers is really handy. If writables or protobufs are faster needs to be benchmarked, but probably both serialize faster than one can write.

On 23.09.2011 15:40, Koert Kuipers wrote:
did you build it on top of zmq? i really don't see the need for people reinventing the low level rcp stuff over and over again. zmq comes with baked in request-response, pub-sub, and pipeline (distributed processing) communication. once you rely on protobuf + zmq for the rpc is it trivial to add clients in other languages, i had java, R and python talking to each other with minimal effort.

for rcp the comparison of speed is a bit of a moot point, since all the latency will be in the communication, not so much in the serialization, i suspect. but once you communicate using protobuf it also becomes really tempting to store in hadoop using protobuf instead of writables/sequencefiles, and from what i have heard (i have not tested this myself) it is a good deal slower in that situation.

On Fri, Sep 23, 2011 at 8:14 AM, Stephan Gammeter <gamme...@vision.ee.ethz.ch <mailto:gamme...@vision.ee.ethz.ch>> wrote:

    I don't think protobuf are slower than writable actually, they do
    really well in speed. I actually wrote some rpc code in C++ for
    protocolbuffers and some swig wrappers to have clients in java. A
    simple c++ server can easily handle about 20k qps in that setup
    and this is just with a naive implementation where still some
    excess data copies happen during the processing of requests. If i
    have time i would like to opensource it, but i would need some
    help to get it running properly in other languages, so that it can
    be truly cross language. (right now servers are only supported in
    c++, clients are synchronous and asynchronous in c++, in java only
    synchronous clients are supported)


    On 21.09.2011 22 <tel:21.09.2011%2022>:59, Koert Kuipers wrote:
    i would love an IDL, plus that modern serialization frameworks
    such as protobuf/thrift support versioning (although i still have
    issues with different versions of thrift not working nicely
    together, argh why is that). the only downside is perhaps that
    they are a little slower than writables.

    On Wed, Sep 21, 2011 at 3:12 AM, Uma Maheswara Rao G 72686
    <mahesw...@huawei.com <mailto:mahesw...@huawei.com>> wrote:

        Hadoop has its RPC machanism mainly Writables to overcome
        some of the disadvantages on normal serializations.
        For more info:
        http://www.lexemetech.com/2008/07/rpc-and-serialization-with-hadoop.html

        Regards,
        Uma
        ----- Original Message -----
        From: jie_zhou <jie_z...@xa.allyes.com
        <mailto:jie_z...@xa.allyes.com>>
        Date: Wednesday, September 21, 2011 12:12 pm
        Subject: A question about RPC
        To: hdfs-user@hadoop.apache.org
        <mailto:hdfs-user@hadoop.apache.org>

        > Dear:
        >
        > Nice to meet you!
        >
        > I am a beginner of hadoop. Recently, I have seen the source
        of RPC of
        > hadoop,but now I have a question. As we know,hadoop RPC
        make use
        > of Dynamic
        > proxy mechanism ,but
        >
        > why not use IDL such as CORBA, or AIDL of Android?
        >
        > Thanks for your early reply.
        >
        > Best Regards,
        >
        > jie
        >
        >
        >
        >
        >
        >
        >
        >
        >
        >





Reply via email to