Hi, I've finally got some work done on the LWLR implementation. It's already functional when used with fixed weights of 1, i.e., linear regression. In that case each mapper gets a vector from the training data and calculates the A matrix (X'*W*X, with W being a diagonal matrix containing the weights for each training vector, currently W = I) and b vector (B'*W*y, again currently with W = I) for that training vector. The reducer then sums the individual As and bs to get the final A and b which are then used to calculate the coefficients vector theta (I think it would be a good idea to have combiners calculating partial sums and then letting the reducer calculate the final sum from the combiners' output). It then loads another file containing input vectors for the prediction phase, constructs a matrix X from those vectors, and calculates the output as y = X * theta.
Now for LWLR it doesn't work like that, since for each prediction input we need another theta vector, so as a first step it would make sense to give the algorithm set of training vectors (containing input vectors and target scalars) and just one prediction input vector. Then each mapper would do just the same as it does now, except that it would also calculate the weight for its training vector using the training input vector and the prediction input vector. Now I come to my question: How can I share the prediction input vector between those individual mappers? I don't want each mapper have it load from I file. I think a good solution would be to pass it using the configuration. In a Hadoop related forum or list someone suggested to serialize the object that you want to share to a String and then put that String into the configuration. Do you think that's a good idea? If yes, what is the proper Mahout way of serializing a Vector to a String and deserializing from String to Vector later? Thanks, Alex