Hi Doug, On Mon, Apr 6, 2009 at 12:12 PM, Doug Cutting <[email protected]> wrote: > Chad Walters wrote: >> >> -- You suggest that there is not a lot in Thrift that Avro can >> leverage. I think you may be overlooking the fact that Thrift has a >> user base and a community of developers who are very interested in >> issues of cross-language data serialization and interoperability. > > I meant that in terms of common code, not coders. Coders can belong to more > than one community but code should generally not. Hadoop Core has become a > sprawling community that we're trying to split. It's more productive to > have have more, small communities than few large ones. A project needs a > handful of active developers, but too many and it becomes ungainly. So, if > it's technically possible for a codebase to be distinct, and it can attract > enough active developers to sustain itself, that is a preferable structure.
I agree with you in general, but cross language libraries require larger communities than other projects. It's non-trivial to gather groups of coders to support each language the project chooses to include. Right now Thrift has some level of support for a dozen languages. We've been really very active in the last several months, and devs have come out of the woodwork to extend their favorite language(s) binding(s). The overhead for those people (or some equivalent group) to pay attention to another mailing list, another bug tracker, another irc channel, and another community isn't trivial. I understand that developing the code itself may be more convenient for some, but I think that the community that supports the code is what really counts. If we can share that, and still achieve our goals, I think we'll be better off. Of course, this assumes that one of the primary goals of Avro is to be cross language. Is that the case, or have I misunderstood? > Avro has unions and a null type, while Thrift does not. Does Thrift support > recursive data structures? We don't support recursive data structures. We do, however, have a ticket open where we're discussing union support (THRIFT-409). In your post you talk about the problems associated with supporting multiple serialization formats. One of the things I like about Thrift is that even though Thrift supports many different things, application developers aren't at all obligated to. In fact, I don't expect anyone does. It would be perfectly reasonable for Hadoop to specify that they use the Avro data format for transmissions, and the cross language library to provide the API could be Thrift. I think you said something similar in your post, but if not please do clarify. On the "names vs field ids" issue: I know that the Ruby and Java Thrift libraries provide name-based access to this information, and know of no restriction that would keep the others from doing the same. It's just a matter of a little code. >> Consider an alternative: making Avro more like a sub-project of >> Thrift or just implementing it directly in Thrift. > > I looked into changing Thrift to support Avro's features, and it was very > messy. Perhaps someone else could do this more easily. > > Building Avro as a part of Thrift would take considerably more effort for me > and I think offer little more than it does separately. If you feel > differently, you are free to fork Avro, start a competitor, provide patches > that integrate it into Thrift, or whatever. I'd again like to appeal to you that it's the community that's harder to develop than the code, and we've got one already. I also don't see the implementation being especially difficult, but maybe we're looking at different information. I'd be happy to talk with you about it if you're open to the idea. The goals of Avro seem to be consistent with the goals of each of Thrift's contributors who have developed a new protocol. We can already offer the things you've stated you don't want to develop, and I think we've got a lot more to gain working together than we do working separately. That being said, I'm fairly confident we'll be providing an Avro protocol on our own at some point if you're not interested in working together. But I think if we go down that path we're doing a disservice to users of both Thrift and Avro. -- Kevin Clark http://glu.ttono.us
