Protobuf +1 Even though the google hasn't officially open source their internal RPC based on protobuf, there are lots of third-party implementations for all most all of the popular languages. After all, protobuf give me a better use experience then that of thrift :)
Thanks, Min On Sat, Sep 15, 2012 at 9:56 AM, Clark Yang (杨卓荦) <[email protected]>wrote: > protobuf +1 > I don't think it is a standard problem. protobuf has already shown a great > many benefits and success in many open source projects. It's widely used > and few better alternative, I think. > > BTW, I have posted the first comment of the first jira. > > Cheers, > Zhuoluo (Clark) Yang > > > > 2012/9/15 Constantine Peresypkin <[email protected]> > > > I really have no idea how one can estimate telco traffic. > > But I highly doubt that you can fruitfully compare reliability of > > internal-only protocol (same implementation, easy to enforce > compatibility) > > to an interoperable one. > > > > On Sat, Sep 15, 2012 at 1:41 AM, Ryan Rawson <[email protected]> wrote: > > > > > I didn't say I was the one making the argument... > > > > > > Google has put probably > 10^24 bytes of data thru protobuf in > > > multiple implementations (eg: serialization on disk and on wire RPC). > > > That is a low estimate. > > > > > > I'd be interested in hearing what 20 years of telco protocol traffic > > > might compare to 10 years of google's usage of protobuf. Exponential > > > curve and all of that. > > > > > > > > > > > > > > > > > > On Fri, Sep 14, 2012 at 3:36 PM, Constantine Peresypkin > > > <[email protected]> wrote: > > > > More battle tested than more than 20 year old standard used almost in > > > every > > > > telecom protocol that exists nowdays? > > > > I think your statement is a little on "too bold" side. :) > > > > > > > > On Sat, Sep 15, 2012 at 1:30 AM, Ryan Rawson <[email protected]> > > wrote: > > > > > > > >> Funny thing, given how much use protobufs has been put thru, I think > > > >> one could make the argument its more battle tested than ASN.1 ... > > > >> > > > >> On Fri, Sep 14, 2012 at 3:24 PM, Constantine Peresypkin > > > >> <[email protected]> wrote: > > > >> > Protobuf is an attempt to make ASN.1 more developer friendly (not > a > > > bad > > > >> > attempt). > > > >> > It's simpler, has much less features, easier to implement and has > a > > > >> compact > > > >> > encoding. > > > >> > But on other hand it's non-standard, "reinvented wheel" they could > > > just > > > >> do > > > >> > a "better than PER" encoding for ASN.1, and AFAIK has no support > for > > > the > > > >> > new and shiny Google encodings, like "group varint". > > > >> > All in all in current situation it seems a better choice than > ASN.1, > > > not > > > >> > even arguing about something even more vague and non-standard as > > > Thrift. > > > >> > > > > >> > On Sat, Sep 15, 2012 at 12:38 AM, Ryan Rawson <[email protected] > > > > > >> wrote: > > > >> > > > > >> >> Thanks for that Ted. > > > >> >> > > > >> >> Correct - internal wire format doesnt mean 'drill only supports > > > >> >> protobuf encoded data'. > > > >> >> > > > >> >> Part of the reason to favor protobuf is that a lot of people in > the > > > >> >> broader 'big data' community are building a lot of experience > with > > > it. > > > >> >> Hadoop and HBase both are moving to/moved to protobuf on the > wire. > > > >> >> Being able to leverage this expertise is valuable. > > > >> >> > > > >> >> There is a JIRA in Hadoop-land where someone had done a deep dive > > > >> >> 'bake off' between thrift, protobuf and avro. The ultimate > choice > > > was > > > >> >> protobuf for a number of reasons. If people want to re-do the > > > >> >> analysis, I'd like to see it in the context of THAT analysis (eg: > > why > > > >> >> the assumptions there are not the same for Drill)... if anything > > it'd > > > >> >> give a concrete form to what can be a mire. > > > >> >> > > > >> >> For what it's worth, I've had many discussion along these angles > > with > > > >> >> a variety of people including committers on Thrift, and the > > consensus > > > >> >> is both are good choices. > > > >> >> > > > >> >> -ryan > > > >> >> > > > >> >> On Fri, Sep 14, 2012 at 2:31 PM, Ted Dunning < > > [email protected]> > > > >> >> wrote: > > > >> >> > I think that it is important to ask a few questions leading up > a > > > >> decision > > > >> >> > here. > > > >> >> > > > > >> >> > The first is a (rhetorical) show of hands about how many people > > > >> believe > > > >> >> > that there are no serious performance or expressivity killers > > when > > > >> >> > comparing alternative serialization frameworks. As far as I > > know, > > > >> >> > performance differences are not massive (and protobufs is one > of > > > the > > > >> >> > leaders in any case) and the expressivity differences are > > > essentially > > > >> >> nil. > > > >> >> > If somebody feels that there is a serious show-stopper with > any > > > >> option, > > > >> >> > they should speak. > > > >> >> > > > > >> >> > The second is to ask the sense of the community whether they > > judge > > > >> >> progress > > > >> >> > or perfection in this decision is most important to the > project. > > > My > > > >> >> guess > > > >> >> > is that almost everybody would prefer to see progress as long > as > > > the > > > >> >> > technical choice is not subject to some horrid missing bit. > > > >> >> > > > > >> >> > The final question is whether it is reasonable to go along with > > > >> protobufs > > > >> >> > given that several very experienced engineers prefer it and > would > > > >> like to > > > >> >> > produce code based on it. If the first two answers are > answered > > to > > > >> the > > > >> >> > effect of protobufs is about as good as we will find and that > > > progress > > > >> >> > trumps small differences, then it seems that moving to follow > > this > > > >> >> > preference of Jason and Ryan for protobufs might be a > reasonable > > > >> thing to > > > >> >> > do. > > > >> >> > > > > >> >> > The question of an internal wire format, btw, does not > constrain > > > the > > > >> >> > project relative to external access. I think it is important > to > > > >> support > > > >> >> > JDBC and ODBC and whatever is in common use for querying. For > > > >> external > > > >> >> > access the question is quite different. Whereas for the > internal > > > >> format > > > >> >> > consensus around a single choice has large benefits, the > external > > > >> format > > > >> >> > choice is nearly the opposite. For an external format, > limiting > > > >> >> ourselves > > > >> >> > to a single choice seems like a bad idea and increasing the > > > audience > > > >> >> seems > > > >> >> > like a better choice. > > > >> >> > > > > >> >> > On Fri, Sep 14, 2012 at 12:44 PM, Ryan Rawson < > > [email protected]> > > > >> >> wrote: > > > >> >> > > > > >> >> >> Hi folks, > > > >> >> >> > > > >> >> >> I just commented on this first JIRA. Here is my text: > > > >> >> >> > > > >> >> >> This issue has been hashed over a lot in the Hadoop projects. > > > There > > > >> >> >> was work done to compare thrift vs avro vs protobuf. The > > > conclusion > > > >> >> >> was protobuf was the decision to use. > > > >> >> >> > > > >> >> >> Prior to this move, there had been a lot of noise about > > pluggable > > > RPC > > > >> >> >> transports, and whatnot. It held up adoption of a backwards > > > >> compatible > > > >> >> >> serialization framework for a long time. The problem ended up > > > being > > > >> >> >> the analysis-paralysis, rather than the specific > implementation > > > >> >> >> problem. In other words, the problem was a LACK of > > implementation > > > >> than > > > >> >> >> actual REAL problems. > > > >> >> >> > > > >> >> >> Based on this experience, I'd strongly suggest adopting > protobuf > > > and > > > >> >> >> moving on. Forget about pluggable RPC implementations, the > > > complexity > > > >> >> >> doesnt deliver benefits. The benefits of protobuf is that its > > the > > > RPC > > > >> >> >> format for Hadoop and HBase, which allows Drill to draw on the > > > broad > > > >> >> >> experience of those communities who need to implement high > > > >> performance > > > >> >> >> backwards compatible RPC serialization. > > > >> >> >> > > > >> >> >> ==== > > > >> >> >> > > > >> >> >> Expanding a bit, I've looked in to this issue a lot, and there > > is > > > >> very > > > >> >> >> few significant concrete reasons to choose protobuf vs thrift. > > > Tiny > > > >> >> >> percent faster of this, and that, etc. I'd strongly suggest > > > protobuf > > > >> >> >> for the expanded community. There is no particular Apache > > > imperative > > > >> >> >> that Apache projects re-use libraries. Use what makes sense > for > > > your > > > >> >> >> project. > > > >> >> >> > > > >> >> >> As regards to Avro, it's a fine serialization format for long > > term > > > >> >> >> data retention, but the complexities that exist to enable that > > > make > > > >> it > > > >> >> >> non-ideal for an RPC. I know of no one who uses AvroRPC in > any > > > form. > > > >> >> >> > > > >> >> >> -ryan > > > >> >> >> > > > >> >> >> On Tue, Sep 4, 2012 at 12:30 PM, Tomer Shiran < > > > [email protected]> > > > >> >> >> wrote: > > > >> >> >> > We plan to propose the architecture and interfaces in the > next > > > >> couple > > > >> >> >> > weeks, which will make it easy to divide the project into > > clear > > > >> >> building > > > >> >> >> > blocks. At that point it will be easier to start > contributing > > > >> >> different > > > >> >> >> > data sources, data formats, operators, query languages, etc. > > > >> >> >> > > > > >> >> >> > The contributions are done in the usual Apache way. It's > best > > to > > > >> open > > > >> >> a > > > >> >> >> > JIRA and then post a patch so that others can review and > then > > a > > > >> >> committer > > > >> >> >> > can check it in. > > > >> >> >> > > > > >> >> >> > On Tue, Sep 4, 2012 at 12:23 PM, Chandan Madhesia < > > > >> >> >> [email protected] > > > >> >> >> >> wrote: > > > >> >> >> > > > > >> >> >> >> Hi > > > >> >> >> >> > > > >> >> >> >> Hi > > > >> >> >> >> > > > >> >> >> >> What is the process to become a contributor to drill ? > > > >> >> >> >> > > > >> >> >> >> Regards > > > >> >> >> >> chandan > > > >> >> >> >> > > > >> >> >> >> On Tue, Sep 4, 2012 at 9:51 PM, Ted Dunning < > > > >> [email protected]> > > > >> >> >> wrote: > > > >> >> >> >> > > > >> >> >> >> > Suffice it to say that if *you* think it is important > > enough > > > to > > > >> >> >> implement > > > >> >> >> >> > and maintain, then the group shouldn't say naye. The > > > consensus > > > >> >> stuff > > > >> >> >> >> > should only block things that break something else. > > Additive > > > >> >> features > > > >> >> >> >> that > > > >> >> >> >> > are highly maintainable (or which come with commitments) > > > >> shouldn't > > > >> >> >> >> > generally be blocked. > > > >> >> >> >> > > > > >> >> >> >> > On Tue, Sep 4, 2012 at 9:14 AM, Michael Hausenblas < > > > >> >> >> >> > [email protected]> wrote: > > > >> >> >> >> > > > > >> >> >> >> > > Good. Feel free to put me down for that, if the group > as > > a > > > >> whole > > > >> >> >> thinks > > > >> >> >> >> > > that (supporting Thrift) makes sense. > > > >> >> >> >> > > > > > >> >> >> >> > > > > >> >> >> >> > > > >> >> >> > > > > >> >> >> > > > > >> >> >> > > > > >> >> >> > -- > > > >> >> >> > Tomer Shiran > > > >> >> >> > Director of Product Management | MapR Technologies | > > > 650-804-8657 > > > >> >> >> > > > >> >> > > > >> > > > > > > -- My research interests are distributed systems, parallel computing and bytecode based virtual machine. My profile: http://www.linkedin.com/in/coderplay My blog: http://coderplay.javaeye.com
