Hello everyone I saw this nice video long back and would like to share with everyone. eBay presented comparison of various serialization techniques, comparing their performance for different payloads, serialized size etc..
Presentation: http://qconsf.com/dl/qcon-sanfran-2011/slides/SastryMalladi_DealingWithPerformanceChallengesOptimizedSerializationTechniques.pdf Video: http://www.infoq.com/presentations/Dealing-with-Performance-Challenges-Optimized-Data-Formats Protobuf performs well in all criteria esp. under high payload size with better size reduction which are critical for drill. Thanks -- Prasanth On Sep 15, 2012, at 9:42 AM, Ted Dunning <[email protected]> wrote: > Yarn is also strictly client/server which leads to all kinds of problems in > Hadoop. > > YarnRPC -1 > > On Sat, Sep 15, 2012 at 5:26 AM, Min Zhou <[email protected]> wrote: > >> YarnRPC -1 >> >> That's quite inefficient in my experience and doesn't support >> multi-languages >> currently. >> >> On Sat, Sep 15, 2012 at 7:51 PM, Hyunsik Choi <[email protected] >>> wrote: >> >>> +1 for json as initial data format >>> >>> In addition, I recommend YarnRPC with protocol buffer for internal RPC >> and >>> API RPC. Protocol buffer is portable to other languages. If we use >> another >>> RPC system, we have to additionally consider the security aspect of >> Hadoop. >>> >>> -- >>> Hyunsik Choi >>> >>> On Sat, Sep 15, 2012 at 8:39 PM, Min Zhou <[email protected]> wrote: >>> >>>> There should be 2 types of serialization method. One should define its >>>> schema, >>>> for the use of RPC, user wire API; while the other need not define >>> schema, >>>> it >>>> typically for internal data transfer, I think fastjson or kryo is quite >>>> suitable for the >>>> latter purpose. >>>> >>>> >>>> Regards, >>>> Min >>>> >>>> On Sat, Sep 15, 2012 at 5:49 PM, Michael Hausenblas < >>>> [email protected]> wrote: >>>> >>>>> >>>>> Point taken … +1 for protobuf - from my POV we can close ISSUE-1 >>>>> >>>>>> The question of an internal wire format, btw, does not constrain >> the >>>>> project relative to external access. >>>>> >>>>> Sounds sensible. >>>>> >>>>> The only one thing I really don't get is: why did you put Avro and >> JSON >>>>> into the proposal [1] in the first place? Or is this the 'external >>>> access' >>>>> from above? >>>>> >>>>> Cheers, >>>>> Michael >>>>> >>>>> [1] http://wiki.apache.org/incubator/DrillProposal >>>>> >>>>> -- >>>>> Michael Hausenblas >>>>> Ireland, Europe >>>>> http://mhausenblas.info/ >>>>> >>>>> On 14 Sep 2012, at 22:31, Ted Dunning wrote: >>>>> >>>>>> I think that it is important to ask a few questions leading up a >>>> decision >>>>>> here. >>>>>> >>>>>> The first is a (rhetorical) show of hands about how many people >>> believe >>>>>> that there are no serious performance or expressivity killers when >>>>>> comparing alternative serialization frameworks. As far as I know, >>>>>> performance differences are not massive (and protobufs is one of >> the >>>>>> leaders in any case) and the expressivity differences are >> essentially >>>>> nil. >>>>>> If somebody feels that there is a serious show-stopper with any >>> option, >>>>>> they should speak. >>>>>> >>>>>> The second is to ask the sense of the community whether they judge >>>>> progress >>>>>> or perfection in this decision is most important to the project. >> My >>>>> guess >>>>>> is that almost everybody would prefer to see progress as long as >> the >>>>>> technical choice is not subject to some horrid missing bit. >>>>>> >>>>>> The final question is whether it is reasonable to go along with >>>> protobufs >>>>>> given that several very experienced engineers prefer it and would >>> like >>>> to >>>>>> produce code based on it. If the first two answers are answered to >>> the >>>>>> effect of protobufs is about as good as we will find and that >>> progress >>>>>> trumps small differences, then it seems that moving to follow this >>>>>> preference of Jason and Ryan for protobufs might be a reasonable >>> thing >>>> to >>>>>> do. >>>>>> >>>>>> The question of an internal wire format, btw, does not constrain >> the >>>>>> project relative to external access. I think it is important to >>>> support >>>>>> JDBC and ODBC and whatever is in common use for querying. For >>> external >>>>>> access the question is quite different. Whereas for the internal >>>> format >>>>>> consensus around a single choice has large benefits, the external >>>> format >>>>>> choice is nearly the opposite. For an external format, limiting >>>>> ourselves >>>>>> to a single choice seems like a bad idea and increasing the >> audience >>>>> seems >>>>>> like a better choice. >>>>>> >>>>>> On Fri, Sep 14, 2012 at 12:44 PM, Ryan Rawson <[email protected]> >>>>> wrote: >>>>>> >>>>>>> Hi folks, >>>>>>> >>>>>>> I just commented on this first JIRA. Here is my text: >>>>>>> >>>>>>> This issue has been hashed over a lot in the Hadoop projects. >> There >>>>>>> was work done to compare thrift vs avro vs protobuf. The >> conclusion >>>>>>> was protobuf was the decision to use. >>>>>>> >>>>>>> Prior to this move, there had been a lot of noise about pluggable >>> RPC >>>>>>> transports, and whatnot. It held up adoption of a backwards >>> compatible >>>>>>> serialization framework for a long time. The problem ended up >> being >>>>>>> the analysis-paralysis, rather than the specific implementation >>>>>>> problem. In other words, the problem was a LACK of implementation >>> than >>>>>>> actual REAL problems. >>>>>>> >>>>>>> Based on this experience, I'd strongly suggest adopting protobuf >> and >>>>>>> moving on. Forget about pluggable RPC implementations, the >>> complexity >>>>>>> doesnt deliver benefits. The benefits of protobuf is that its the >>> RPC >>>>>>> format for Hadoop and HBase, which allows Drill to draw on the >> broad >>>>>>> experience of those communities who need to implement high >>> performance >>>>>>> backwards compatible RPC serialization. >>>>>>> >>>>>>> ==== >>>>>>> >>>>>>> Expanding a bit, I've looked in to this issue a lot, and there is >>> very >>>>>>> few significant concrete reasons to choose protobuf vs thrift. >> Tiny >>>>>>> percent faster of this, and that, etc. I'd strongly suggest >>> protobuf >>>>>>> for the expanded community. There is no particular Apache >>> imperative >>>>>>> that Apache projects re-use libraries. Use what makes sense for >>> your >>>>>>> project. >>>>>>> >>>>>>> As regards to Avro, it's a fine serialization format for long term >>>>>>> data retention, but the complexities that exist to enable that >> make >>> it >>>>>>> non-ideal for an RPC. I know of no one who uses AvroRPC in any >>> form. >>>>>>> >>>>>>> -ryan >>>>>>> >>>>>>> On Tue, Sep 4, 2012 at 12:30 PM, Tomer Shiran < >> [email protected] >>>> >>>>>>> wrote: >>>>>>>> We plan to propose the architecture and interfaces in the next >>> couple >>>>>>>> weeks, which will make it easy to divide the project into clear >>>>> building >>>>>>>> blocks. At that point it will be easier to start contributing >>>> different >>>>>>>> data sources, data formats, operators, query languages, etc. >>>>>>>> >>>>>>>> The contributions are done in the usual Apache way. It's best to >>>> open a >>>>>>>> JIRA and then post a patch so that others can review and then a >>>>> committer >>>>>>>> can check it in. >>>>>>>> >>>>>>>> On Tue, Sep 4, 2012 at 12:23 PM, Chandan Madhesia < >>>>>>> [email protected] >>>>>>>>> wrote: >>>>>>>> >>>>>>>>> Hi >>>>>>>>> >>>>>>>>> Hi >>>>>>>>> >>>>>>>>> What is the process to become a contributor to drill ? >>>>>>>>> >>>>>>>>> Regards >>>>>>>>> chandan >>>>>>>>> >>>>>>>>> On Tue, Sep 4, 2012 at 9:51 PM, Ted Dunning < >>> [email protected]> >>>>>>> wrote: >>>>>>>>> >>>>>>>>>> Suffice it to say that if *you* think it is important enough to >>>>>>> implement >>>>>>>>>> and maintain, then the group shouldn't say naye. The consensus >>>> stuff >>>>>>>>>> should only block things that break something else. Additive >>>>> features >>>>>>>>> that >>>>>>>>>> are highly maintainable (or which come with commitments) >>> shouldn't >>>>>>>>>> generally be blocked. >>>>>>>>>> >>>>>>>>>> On Tue, Sep 4, 2012 at 9:14 AM, Michael Hausenblas < >>>>>>>>>> [email protected]> wrote: >>>>>>>>>> >>>>>>>>>>> Good. Feel free to put me down for that, if the group as a >> whole >>>>>>> thinks >>>>>>>>>>> that (supporting Thrift) makes sense. >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> Tomer Shiran >>>>>>>> Director of Product Management | MapR Technologies | >> 650-804-8657 >>>>>>> >>>>> >>>>> >>>> >>>> >>>> -- >>>> My research interests are distributed systems, parallel computing and >>>> bytecode based virtual machine. >>>> >>>> My profile: >>>> http://www.linkedin.com/in/coderplay >>>> My blog: >>>> http://coderplay.javaeye.com >>>> >>> >> >> >> >> -- >> My research interests are distributed systems, parallel computing and >> bytecode based virtual machine. >> >> My profile: >> http://www.linkedin.com/in/coderplay >> My blog: >> http://coderplay.javaeye.com >>
