Yarn is also strictly client/server which leads to all kinds of problems in Hadoop.
YarnRPC -1 On Sat, Sep 15, 2012 at 5:26 AM, Min Zhou <[email protected]> wrote: > YarnRPC -1 > > That's quite inefficient in my experience and doesn't support > multi-languages > currently. > > On Sat, Sep 15, 2012 at 7:51 PM, Hyunsik Choi <[email protected] > >wrote: > > > +1 for json as initial data format > > > > In addition, I recommend YarnRPC with protocol buffer for internal RPC > and > > API RPC. Protocol buffer is portable to other languages. If we use > another > > RPC system, we have to additionally consider the security aspect of > Hadoop. > > > > -- > > Hyunsik Choi > > > > On Sat, Sep 15, 2012 at 8:39 PM, Min Zhou <[email protected]> wrote: > > > > > There should be 2 types of serialization method. One should define its > > > schema, > > > for the use of RPC, user wire API; while the other need not define > > schema, > > > it > > > typically for internal data transfer, I think fastjson or kryo is quite > > > suitable for the > > > latter purpose. > > > > > > > > > Regards, > > > Min > > > > > > On Sat, Sep 15, 2012 at 5:49 PM, Michael Hausenblas < > > > [email protected]> wrote: > > > > > > > > > > > Point taken … +1 for protobuf - from my POV we can close ISSUE-1 > > > > > > > > > The question of an internal wire format, btw, does not constrain > the > > > > project relative to external access. > > > > > > > > Sounds sensible. > > > > > > > > The only one thing I really don't get is: why did you put Avro and > JSON > > > > into the proposal [1] in the first place? Or is this the 'external > > > access' > > > > from above? > > > > > > > > Cheers, > > > > Michael > > > > > > > > [1] http://wiki.apache.org/incubator/DrillProposal > > > > > > > > -- > > > > Michael Hausenblas > > > > Ireland, Europe > > > > http://mhausenblas.info/ > > > > > > > > On 14 Sep 2012, at 22:31, Ted Dunning wrote: > > > > > > > > > I think that it is important to ask a few questions leading up a > > > decision > > > > > here. > > > > > > > > > > The first is a (rhetorical) show of hands about how many people > > believe > > > > > that there are no serious performance or expressivity killers when > > > > > comparing alternative serialization frameworks. As far as I know, > > > > > performance differences are not massive (and protobufs is one of > the > > > > > leaders in any case) and the expressivity differences are > essentially > > > > nil. > > > > > If somebody feels that there is a serious show-stopper with any > > option, > > > > > they should speak. > > > > > > > > > > The second is to ask the sense of the community whether they judge > > > > progress > > > > > or perfection in this decision is most important to the project. > My > > > > guess > > > > > is that almost everybody would prefer to see progress as long as > the > > > > > technical choice is not subject to some horrid missing bit. > > > > > > > > > > The final question is whether it is reasonable to go along with > > > protobufs > > > > > given that several very experienced engineers prefer it and would > > like > > > to > > > > > produce code based on it. If the first two answers are answered to > > the > > > > > effect of protobufs is about as good as we will find and that > > progress > > > > > trumps small differences, then it seems that moving to follow this > > > > > preference of Jason and Ryan for protobufs might be a reasonable > > thing > > > to > > > > > do. > > > > > > > > > > The question of an internal wire format, btw, does not constrain > the > > > > > project relative to external access. I think it is important to > > > support > > > > > JDBC and ODBC and whatever is in common use for querying. For > > external > > > > > access the question is quite different. Whereas for the internal > > > format > > > > > consensus around a single choice has large benefits, the external > > > format > > > > > choice is nearly the opposite. For an external format, limiting > > > > ourselves > > > > > to a single choice seems like a bad idea and increasing the > audience > > > > seems > > > > > like a better choice. > > > > > > > > > > On Fri, Sep 14, 2012 at 12:44 PM, Ryan Rawson <[email protected]> > > > > wrote: > > > > > > > > > >> Hi folks, > > > > >> > > > > >> I just commented on this first JIRA. Here is my text: > > > > >> > > > > >> This issue has been hashed over a lot in the Hadoop projects. > There > > > > >> was work done to compare thrift vs avro vs protobuf. The > conclusion > > > > >> was protobuf was the decision to use. > > > > >> > > > > >> Prior to this move, there had been a lot of noise about pluggable > > RPC > > > > >> transports, and whatnot. It held up adoption of a backwards > > compatible > > > > >> serialization framework for a long time. The problem ended up > being > > > > >> the analysis-paralysis, rather than the specific implementation > > > > >> problem. In other words, the problem was a LACK of implementation > > than > > > > >> actual REAL problems. > > > > >> > > > > >> Based on this experience, I'd strongly suggest adopting protobuf > and > > > > >> moving on. Forget about pluggable RPC implementations, the > > complexity > > > > >> doesnt deliver benefits. The benefits of protobuf is that its the > > RPC > > > > >> format for Hadoop and HBase, which allows Drill to draw on the > broad > > > > >> experience of those communities who need to implement high > > performance > > > > >> backwards compatible RPC serialization. > > > > >> > > > > >> ==== > > > > >> > > > > >> Expanding a bit, I've looked in to this issue a lot, and there is > > very > > > > >> few significant concrete reasons to choose protobuf vs thrift. > Tiny > > > > >> percent faster of this, and that, etc. I'd strongly suggest > > protobuf > > > > >> for the expanded community. There is no particular Apache > > imperative > > > > >> that Apache projects re-use libraries. Use what makes sense for > > your > > > > >> project. > > > > >> > > > > >> As regards to Avro, it's a fine serialization format for long term > > > > >> data retention, but the complexities that exist to enable that > make > > it > > > > >> non-ideal for an RPC. I know of no one who uses AvroRPC in any > > form. > > > > >> > > > > >> -ryan > > > > >> > > > > >> On Tue, Sep 4, 2012 at 12:30 PM, Tomer Shiran < > [email protected] > > > > > > > >> wrote: > > > > >>> We plan to propose the architecture and interfaces in the next > > couple > > > > >>> weeks, which will make it easy to divide the project into clear > > > > building > > > > >>> blocks. At that point it will be easier to start contributing > > > different > > > > >>> data sources, data formats, operators, query languages, etc. > > > > >>> > > > > >>> The contributions are done in the usual Apache way. It's best to > > > open a > > > > >>> JIRA and then post a patch so that others can review and then a > > > > committer > > > > >>> can check it in. > > > > >>> > > > > >>> On Tue, Sep 4, 2012 at 12:23 PM, Chandan Madhesia < > > > > >> [email protected] > > > > >>>> wrote: > > > > >>> > > > > >>>> Hi > > > > >>>> > > > > >>>> Hi > > > > >>>> > > > > >>>> What is the process to become a contributor to drill ? > > > > >>>> > > > > >>>> Regards > > > > >>>> chandan > > > > >>>> > > > > >>>> On Tue, Sep 4, 2012 at 9:51 PM, Ted Dunning < > > [email protected]> > > > > >> wrote: > > > > >>>> > > > > >>>>> Suffice it to say that if *you* think it is important enough to > > > > >> implement > > > > >>>>> and maintain, then the group shouldn't say naye. The consensus > > > stuff > > > > >>>>> should only block things that break something else. Additive > > > > features > > > > >>>> that > > > > >>>>> are highly maintainable (or which come with commitments) > > shouldn't > > > > >>>>> generally be blocked. > > > > >>>>> > > > > >>>>> On Tue, Sep 4, 2012 at 9:14 AM, Michael Hausenblas < > > > > >>>>> [email protected]> wrote: > > > > >>>>> > > > > >>>>>> Good. Feel free to put me down for that, if the group as a > whole > > > > >> thinks > > > > >>>>>> that (supporting Thrift) makes sense. > > > > >>>>>> > > > > >>>>> > > > > >>>> > > > > >>> > > > > >>> > > > > >>> > > > > >>> -- > > > > >>> Tomer Shiran > > > > >>> Director of Product Management | MapR Technologies | > 650-804-8657 > > > > >> > > > > > > > > > > > > > > > > > -- > > > My research interests are distributed systems, parallel computing and > > > bytecode based virtual machine. > > > > > > My profile: > > > http://www.linkedin.com/in/coderplay > > > My blog: > > > http://coderplay.javaeye.com > > > > > > > > > -- > My research interests are distributed systems, parallel computing and > bytecode based virtual machine. > > My profile: > http://www.linkedin.com/in/coderplay > My blog: > http://coderplay.javaeye.com >
