This might be more appropriate on the Mahout list. I have copied that list in order to gain the largest audience for the answers.
It is an absolute requirement in Mahout to have multiple vector implementations. It is also a requirement that the math library not depend on Hadoop. A third absolute requirement in Mahout is that very simple Java programming suffice for working with Vectors of many types as well as Matrix values. In order to meet these requirements and allow the simplest form of map-reduce programming, we implemented a class VectorWritable which will wrap any kind of vector as a writable object. You can retrieve the underlying vector from the VectorWritable and there is some discusion about making VW implement the Vector interface as well. If your code returns a VectorWritable, then Hadoop should be able to serialize it trivially. If your code returns a Vector, however, it will not natively be serializable. It should be possible to inject a single registration into Kryo, however, that will understand how to serialize Vector's using the VectorWritable infrastructure. On Sat, Jan 12, 2013 at 11:49 PM, Koert Kuipers <[email protected]> wrote: > i would like to have some mahout vectors flow through a scalding job. i > thought at first that this should be easy since the mahout vector is a > writable so if i put it in the tuple all will be fine. but then i realized > mahout did this thing where they split up the vector in a whole bunch of > classes and interfaces: they have the Vector interface, implementations > such as DenseVector and SparseSequentialAcccessVector, and then the class > VectorWritable which takes a Vector and turns it into a Writable. argh. so > now if i have for example a DenseVector then i think it will not get > serialized as a Writable and then kryo will attempt to serialize it > instead, which is not what i want. any ideas for an elegant solution (i > wish a simple scala implicit conversion would do the trick!). should i add > a custom hadoop Serializer to catch these (seems ugly)? > > -- > You received this message because you are subscribed to the Google Groups > "cascading-user" group. > To post to this group, send email to [email protected]. > To unsubscribe from this group, send email to > [email protected]. > For more options, visit this group at > http://groups.google.com/group/cascading-user?hl=en. >
