More battle tested than more than 20 year old standard used almost in every telecom protocol that exists nowdays? I think your statement is a little on "too bold" side. :)
On Sat, Sep 15, 2012 at 1:30 AM, Ryan Rawson <[email protected]> wrote: > Funny thing, given how much use protobufs has been put thru, I think > one could make the argument its more battle tested than ASN.1 ... > > On Fri, Sep 14, 2012 at 3:24 PM, Constantine Peresypkin > <[email protected]> wrote: > > Protobuf is an attempt to make ASN.1 more developer friendly (not a bad > > attempt). > > It's simpler, has much less features, easier to implement and has a > compact > > encoding. > > But on other hand it's non-standard, "reinvented wheel" they could just > do > > a "better than PER" encoding for ASN.1, and AFAIK has no support for the > > new and shiny Google encodings, like "group varint". > > All in all in current situation it seems a better choice than ASN.1, not > > even arguing about something even more vague and non-standard as Thrift. > > > > On Sat, Sep 15, 2012 at 12:38 AM, Ryan Rawson <[email protected]> > wrote: > > > >> Thanks for that Ted. > >> > >> Correct - internal wire format doesnt mean 'drill only supports > >> protobuf encoded data'. > >> > >> Part of the reason to favor protobuf is that a lot of people in the > >> broader 'big data' community are building a lot of experience with it. > >> Hadoop and HBase both are moving to/moved to protobuf on the wire. > >> Being able to leverage this expertise is valuable. > >> > >> There is a JIRA in Hadoop-land where someone had done a deep dive > >> 'bake off' between thrift, protobuf and avro. The ultimate choice was > >> protobuf for a number of reasons. If people want to re-do the > >> analysis, I'd like to see it in the context of THAT analysis (eg: why > >> the assumptions there are not the same for Drill)... if anything it'd > >> give a concrete form to what can be a mire. > >> > >> For what it's worth, I've had many discussion along these angles with > >> a variety of people including committers on Thrift, and the consensus > >> is both are good choices. > >> > >> -ryan > >> > >> On Fri, Sep 14, 2012 at 2:31 PM, Ted Dunning <[email protected]> > >> wrote: > >> > I think that it is important to ask a few questions leading up a > decision > >> > here. > >> > > >> > The first is a (rhetorical) show of hands about how many people > believe > >> > that there are no serious performance or expressivity killers when > >> > comparing alternative serialization frameworks. As far as I know, > >> > performance differences are not massive (and protobufs is one of the > >> > leaders in any case) and the expressivity differences are essentially > >> nil. > >> > If somebody feels that there is a serious show-stopper with any > option, > >> > they should speak. > >> > > >> > The second is to ask the sense of the community whether they judge > >> progress > >> > or perfection in this decision is most important to the project. My > >> guess > >> > is that almost everybody would prefer to see progress as long as the > >> > technical choice is not subject to some horrid missing bit. > >> > > >> > The final question is whether it is reasonable to go along with > protobufs > >> > given that several very experienced engineers prefer it and would > like to > >> > produce code based on it. If the first two answers are answered to > the > >> > effect of protobufs is about as good as we will find and that progress > >> > trumps small differences, then it seems that moving to follow this > >> > preference of Jason and Ryan for protobufs might be a reasonable > thing to > >> > do. > >> > > >> > The question of an internal wire format, btw, does not constrain the > >> > project relative to external access. I think it is important to > support > >> > JDBC and ODBC and whatever is in common use for querying. For > external > >> > access the question is quite different. Whereas for the internal > format > >> > consensus around a single choice has large benefits, the external > format > >> > choice is nearly the opposite. For an external format, limiting > >> ourselves > >> > to a single choice seems like a bad idea and increasing the audience > >> seems > >> > like a better choice. > >> > > >> > On Fri, Sep 14, 2012 at 12:44 PM, Ryan Rawson <[email protected]> > >> wrote: > >> > > >> >> Hi folks, > >> >> > >> >> I just commented on this first JIRA. Here is my text: > >> >> > >> >> This issue has been hashed over a lot in the Hadoop projects. There > >> >> was work done to compare thrift vs avro vs protobuf. The conclusion > >> >> was protobuf was the decision to use. > >> >> > >> >> Prior to this move, there had been a lot of noise about pluggable RPC > >> >> transports, and whatnot. It held up adoption of a backwards > compatible > >> >> serialization framework for a long time. The problem ended up being > >> >> the analysis-paralysis, rather than the specific implementation > >> >> problem. In other words, the problem was a LACK of implementation > than > >> >> actual REAL problems. > >> >> > >> >> Based on this experience, I'd strongly suggest adopting protobuf and > >> >> moving on. Forget about pluggable RPC implementations, the complexity > >> >> doesnt deliver benefits. The benefits of protobuf is that its the RPC > >> >> format for Hadoop and HBase, which allows Drill to draw on the broad > >> >> experience of those communities who need to implement high > performance > >> >> backwards compatible RPC serialization. > >> >> > >> >> ==== > >> >> > >> >> Expanding a bit, I've looked in to this issue a lot, and there is > very > >> >> few significant concrete reasons to choose protobuf vs thrift. Tiny > >> >> percent faster of this, and that, etc. I'd strongly suggest protobuf > >> >> for the expanded community. There is no particular Apache imperative > >> >> that Apache projects re-use libraries. Use what makes sense for your > >> >> project. > >> >> > >> >> As regards to Avro, it's a fine serialization format for long term > >> >> data retention, but the complexities that exist to enable that make > it > >> >> non-ideal for an RPC. I know of no one who uses AvroRPC in any form. > >> >> > >> >> -ryan > >> >> > >> >> On Tue, Sep 4, 2012 at 12:30 PM, Tomer Shiran <[email protected]> > >> >> wrote: > >> >> > We plan to propose the architecture and interfaces in the next > couple > >> >> > weeks, which will make it easy to divide the project into clear > >> building > >> >> > blocks. At that point it will be easier to start contributing > >> different > >> >> > data sources, data formats, operators, query languages, etc. > >> >> > > >> >> > The contributions are done in the usual Apache way. It's best to > open > >> a > >> >> > JIRA and then post a patch so that others can review and then a > >> committer > >> >> > can check it in. > >> >> > > >> >> > On Tue, Sep 4, 2012 at 12:23 PM, Chandan Madhesia < > >> >> [email protected] > >> >> >> wrote: > >> >> > > >> >> >> Hi > >> >> >> > >> >> >> Hi > >> >> >> > >> >> >> What is the process to become a contributor to drill ? > >> >> >> > >> >> >> Regards > >> >> >> chandan > >> >> >> > >> >> >> On Tue, Sep 4, 2012 at 9:51 PM, Ted Dunning < > [email protected]> > >> >> wrote: > >> >> >> > >> >> >> > Suffice it to say that if *you* think it is important enough to > >> >> implement > >> >> >> > and maintain, then the group shouldn't say naye. The consensus > >> stuff > >> >> >> > should only block things that break something else. Additive > >> features > >> >> >> that > >> >> >> > are highly maintainable (or which come with commitments) > shouldn't > >> >> >> > generally be blocked. > >> >> >> > > >> >> >> > On Tue, Sep 4, 2012 at 9:14 AM, Michael Hausenblas < > >> >> >> > [email protected]> wrote: > >> >> >> > > >> >> >> > > Good. Feel free to put me down for that, if the group as a > whole > >> >> thinks > >> >> >> > > that (supporting Thrift) makes sense. > >> >> >> > > > >> >> >> > > >> >> >> > >> >> > > >> >> > > >> >> > > >> >> > -- > >> >> > Tomer Shiran > >> >> > Director of Product Management | MapR Technologies | 650-804-8657 > >> >> > >> >
