Sharing a vector between mappers

Alexander Hans Wed, 20 Oct 2010 03:30:28 -0700

Hi,

I've finally got some work done on the LWLR implementation. It's already
functional when used with fixed weights of 1, i.e., linear regression. In
that case each mapper gets a vector from the training data and calculates
the A matrix (X'*W*X, with W being a diagonal matrix containing the
weights for each training vector, currently W = I) and b vector (B'*W*y,
again currently with W = I) for that training vector. The reducer then
sums the individual As and bs to get the final A and b which are then used
to calculate the coefficients vector theta (I think it would be a good
idea to have combiners calculating partial sums and then letting the
reducer calculate the final sum from the combiners' output). It then loads
another file containing input vectors for the prediction phase, constructs
a matrix X from those vectors, and calculates the output as y = X * theta.


Now for LWLR it doesn't work like that, since for each prediction input we
need another theta vector, so as a first step it would make sense to give
the algorithm set of training vectors (containing input vectors and target
scalars) and just one prediction input vector. Then each mapper would do
just the same as it does now, except that it would also calculate the
weight for its training vector using the training input vector and the
prediction input vector. Now I come to my question: How can I share the
prediction input vector between those individual mappers? I don't want
each mapper have it load from I file. I think a good solution would be to
pass it using the configuration. In a Hadoop related forum or list someone
suggested to serialize the object that you want to share to a String and
then put that String into the configuration. Do you think that's a good
idea? If yes, what is the proper Mahout way of serializing a Vector to a
String and deserializing from String to Vector later?


Thanks,

Alex

Sharing a vector between mappers

Reply via email to