+1 for json as initial data format In addition, I recommend YarnRPC with protocol buffer for internal RPC and API RPC. Protocol buffer is portable to other languages. If we use another RPC system, we have to additionally consider the security aspect of Hadoop.
-- Hyunsik Choi On Sat, Sep 15, 2012 at 8:39 PM, Min Zhou <[email protected]> wrote: > There should be 2 types of serialization method. One should define its > schema, > for the use of RPC, user wire API; while the other need not define schema, > it > typically for internal data transfer, I think fastjson or kryo is quite > suitable for the > latter purpose. > > > Regards, > Min > > On Sat, Sep 15, 2012 at 5:49 PM, Michael Hausenblas < > [email protected]> wrote: > > > > > Point taken … +1 for protobuf - from my POV we can close ISSUE-1 > > > > > The question of an internal wire format, btw, does not constrain the > > project relative to external access. > > > > Sounds sensible. > > > > The only one thing I really don't get is: why did you put Avro and JSON > > into the proposal [1] in the first place? Or is this the 'external > access' > > from above? > > > > Cheers, > > Michael > > > > [1] http://wiki.apache.org/incubator/DrillProposal > > > > -- > > Michael Hausenblas > > Ireland, Europe > > http://mhausenblas.info/ > > > > On 14 Sep 2012, at 22:31, Ted Dunning wrote: > > > > > I think that it is important to ask a few questions leading up a > decision > > > here. > > > > > > The first is a (rhetorical) show of hands about how many people believe > > > that there are no serious performance or expressivity killers when > > > comparing alternative serialization frameworks. As far as I know, > > > performance differences are not massive (and protobufs is one of the > > > leaders in any case) and the expressivity differences are essentially > > nil. > > > If somebody feels that there is a serious show-stopper with any option, > > > they should speak. > > > > > > The second is to ask the sense of the community whether they judge > > progress > > > or perfection in this decision is most important to the project. My > > guess > > > is that almost everybody would prefer to see progress as long as the > > > technical choice is not subject to some horrid missing bit. > > > > > > The final question is whether it is reasonable to go along with > protobufs > > > given that several very experienced engineers prefer it and would like > to > > > produce code based on it. If the first two answers are answered to the > > > effect of protobufs is about as good as we will find and that progress > > > trumps small differences, then it seems that moving to follow this > > > preference of Jason and Ryan for protobufs might be a reasonable thing > to > > > do. > > > > > > The question of an internal wire format, btw, does not constrain the > > > project relative to external access. I think it is important to > support > > > JDBC and ODBC and whatever is in common use for querying. For external > > > access the question is quite different. Whereas for the internal > format > > > consensus around a single choice has large benefits, the external > format > > > choice is nearly the opposite. For an external format, limiting > > ourselves > > > to a single choice seems like a bad idea and increasing the audience > > seems > > > like a better choice. > > > > > > On Fri, Sep 14, 2012 at 12:44 PM, Ryan Rawson <[email protected]> > > wrote: > > > > > >> Hi folks, > > >> > > >> I just commented on this first JIRA. Here is my text: > > >> > > >> This issue has been hashed over a lot in the Hadoop projects. There > > >> was work done to compare thrift vs avro vs protobuf. The conclusion > > >> was protobuf was the decision to use. > > >> > > >> Prior to this move, there had been a lot of noise about pluggable RPC > > >> transports, and whatnot. It held up adoption of a backwards compatible > > >> serialization framework for a long time. The problem ended up being > > >> the analysis-paralysis, rather than the specific implementation > > >> problem. In other words, the problem was a LACK of implementation than > > >> actual REAL problems. > > >> > > >> Based on this experience, I'd strongly suggest adopting protobuf and > > >> moving on. Forget about pluggable RPC implementations, the complexity > > >> doesnt deliver benefits. The benefits of protobuf is that its the RPC > > >> format for Hadoop and HBase, which allows Drill to draw on the broad > > >> experience of those communities who need to implement high performance > > >> backwards compatible RPC serialization. > > >> > > >> ==== > > >> > > >> Expanding a bit, I've looked in to this issue a lot, and there is very > > >> few significant concrete reasons to choose protobuf vs thrift. Tiny > > >> percent faster of this, and that, etc. I'd strongly suggest protobuf > > >> for the expanded community. There is no particular Apache imperative > > >> that Apache projects re-use libraries. Use what makes sense for your > > >> project. > > >> > > >> As regards to Avro, it's a fine serialization format for long term > > >> data retention, but the complexities that exist to enable that make it > > >> non-ideal for an RPC. I know of no one who uses AvroRPC in any form. > > >> > > >> -ryan > > >> > > >> On Tue, Sep 4, 2012 at 12:30 PM, Tomer Shiran <[email protected]> > > >> wrote: > > >>> We plan to propose the architecture and interfaces in the next couple > > >>> weeks, which will make it easy to divide the project into clear > > building > > >>> blocks. At that point it will be easier to start contributing > different > > >>> data sources, data formats, operators, query languages, etc. > > >>> > > >>> The contributions are done in the usual Apache way. It's best to > open a > > >>> JIRA and then post a patch so that others can review and then a > > committer > > >>> can check it in. > > >>> > > >>> On Tue, Sep 4, 2012 at 12:23 PM, Chandan Madhesia < > > >> [email protected] > > >>>> wrote: > > >>> > > >>>> Hi > > >>>> > > >>>> Hi > > >>>> > > >>>> What is the process to become a contributor to drill ? > > >>>> > > >>>> Regards > > >>>> chandan > > >>>> > > >>>> On Tue, Sep 4, 2012 at 9:51 PM, Ted Dunning <[email protected]> > > >> wrote: > > >>>> > > >>>>> Suffice it to say that if *you* think it is important enough to > > >> implement > > >>>>> and maintain, then the group shouldn't say naye. The consensus > stuff > > >>>>> should only block things that break something else. Additive > > features > > >>>> that > > >>>>> are highly maintainable (or which come with commitments) shouldn't > > >>>>> generally be blocked. > > >>>>> > > >>>>> On Tue, Sep 4, 2012 at 9:14 AM, Michael Hausenblas < > > >>>>> [email protected]> wrote: > > >>>>> > > >>>>>> Good. Feel free to put me down for that, if the group as a whole > > >> thinks > > >>>>>> that (supporting Thrift) makes sense. > > >>>>>> > > >>>>> > > >>>> > > >>> > > >>> > > >>> > > >>> -- > > >>> Tomer Shiran > > >>> Director of Product Management | MapR Technologies | 650-804-8657 > > >> > > > > > > > -- > My research interests are distributed systems, parallel computing and > bytecode based virtual machine. > > My profile: > http://www.linkedin.com/in/coderplay > My blog: > http://coderplay.javaeye.com >
