On Tue, Jan 5, 2010 at 4:18 AM, Jake Mannix <jake.man...@gmail.com> wrote:
> This is how I've > in general looked at Writables - they tend very much to be very loosely > object oriented, in the sense that they are typically just wrappers around > some data object, and provide marshalling/unmarshalling capabilities for > said object (but the Writable itself rarely (ever?) actually also implements > any useful interface the held object implements - when you want said object, > you Writable.get() on it to fetch the inner guy). Yes, this is exactly the approach taken (from an api perspective at least) with the Text Writable in hadoop-core for instance. > Or is there a better way to do this? What I really think is necessary, as > an end-goal, is for us to be able to spit out int + Vector key-value pairs > from mappers and reducers, and not need to know which kind they are in the > mapper or reducer Perhaps the space advantages for sparse and dense serialized forms suggest the need for SparseVectorWritable and DenseVectorWritable? Implementors could make either a choice which to use, or perhaps allow a specific implementation to be plugged in at runtime in a way similar to how similarity measures are injected. I suspect there must be some way to hint to a single VectorWritable class what sort of vector the sparse data must be read into. Have you seen any cases where a class hierarchy of Writables is established to do something like that? E.g the mapreduce jobs are written to use VectorWritable, but subclasses (e.g SparseVectorWritable) are available for specific needs? Drew