There is no problem with using a third-party library licensed under the Apache license.
However in the case of this particular library: I would not use JSON, no. We just got done removing usages of it, for one, but, even so it was never used for key/value serialization. It's a somewhat verbose format and just not appropriate at scale, where a compact binary format can save terabytes of storage, network transfer, not to mention hours of CPU. I don't think it's hard or time-consuming to write Writable implementations for the few new key/value classes you'll need. Most everything you'll want is written by Mahout or Hadoop already. The read / write method you'd implement are just tens of lines of code anyway. On Mon, May 9, 2011 at 8:03 PM, Dhruv <[email protected]> wrote: > Cloud 9 is an easy to use Hadoop MapReduce library by Jimmy Lin from the > University of Maryland using the Apache 2.0 license ( > http://www.umiacs.umd.edu/~jimmylin/cloud9/docs/). The library contains a > very convenient, lightweight JSON serializable class. One can use this class > instead of rolling your own custom serializable objects and it could help me > for the GSOC . > > What are Mahout's/ASF's policies regarding the use of such open third party > libraries? > > What is the general opinion regarding using JSON serialization on Hadoop? > > In another email conversation, Grant did mention that JSON is slow and also > that GSON had been used in the past by Mahout. > > Also, I had allocated sufficient time in my proposal, almost one month for > implementing this custom object during the mapper's implementation so I > could still just go ahead as planned before. >
