Hi Ted, > Passing small amounts of data via configuration is reasonable to do, but it > isn't clear that this is a good idea for you. Do you really only want > to pass around a single input vector for an entire map-reduce invocation? > Map-reduce takes a looong time to get started.
Yeah, that's what I figured as well and wasn't sure whether for LWLS it really makes sense to do map-reduce. But if you have huge amounts of data and high-dimensional input, it probably would speed it up. Moreover, one node acting as a mapper could process several input vectors, it would have to read the prediction input vector from the configuration only once. Anyway, I have it functional now passing the prediction input as String via the configuration. I've used VectorWritable to get a byte[] and encoded that with XStream's Base64Encoder, the other end decodes to byte[] and reads that again with a VectorWritable. That would give one map-reduce job per prediction input. But now that I read your reply it becomes clear that the better solution for determining predictions for more than one prediction input vector would indeed be reading those vectors from the distributed cache or hdfs directly and thus formulate it as a single map-reduce job. In that case I only have to make sure that the keys are right. > Also, I have written an implementation of LSMR for iterative linear solution. > Would that be helpful for you? I don't think so, the final linear equations problem isn't sparse. > I think that you may have mentioned that you were looking at LSQR > some time ago. Maybe you're mixing that up with my comment regarding the LWLR algorithm where one in the end has to calculate theta = inv(A) * b. You said I shouldn't do the inversion literally. I'm now using Colt's Algebra.solve as t = Algebra.solve(A, b). I think by the end of the week I can put a patch in Jira, it's probably easier to discuss once there's already some code. There are a couple of open questions. For instance, to get the weight one would use a kernel. As it seems, so far nothing regarding kernels is implemented. For now I put a Kernel interface and a GaussianKernel implementation in the LWLR package, but there's probably a more appropriate place for this, since I guess that other algorithms will make use of kernels as well. Moreover, I had to enable reading/writing of matrices using sequence files, I think I will make a separate patch for that. Cheers, Alex