On Tue, Jan 5, 2010 at 4:18 AM, Jake Mannix <jake.man...@gmail.com> wrote:

> This is how I've
> in general looked at Writables - they tend very much to be very loosely
> object oriented, in the sense that they are typically just wrappers around
> some data object, and provide marshalling/unmarshalling capabilities for
> said object (but the Writable itself rarely (ever?) actually also implements
> any useful interface the held object implements - when you want said object,
> you Writable.get() on it to fetch the inner guy).

Yes, this is exactly the approach taken (from an api perspective at
least) with the Text Writable in hadoop-core for instance.

>  Or is there a better way to do this?  What I really think is necessary, as
> an end-goal, is for us to be able to spit out int + Vector key-value pairs
> from mappers and reducers, and not need to know which kind they are in the
> mapper or reducer

Perhaps the space advantages for sparse and dense serialized forms
suggest the need for SparseVectorWritable and DenseVectorWritable?
Implementors could make either a choice which to use, or perhaps allow
a specific implementation to be plugged in at runtime in a way similar
to how similarity measures are injected. I suspect there must be some
way to hint to a single VectorWritable class what sort of vector the
sparse data must be read into.

Have you seen any cases where a class hierarchy of Writables is
established to do something like that? E.g the mapreduce jobs are
written to use VectorWritable, but subclasses (e.g
SparseVectorWritable) are available for specific needs?

Drew

Reply via email to