Re: Sharing a vector between mappers

Alexander Hans Thu, 21 Oct 2010 00:28:44 -0700

Hi Ted,

> Passing small amounts of data via configuration is reasonable to do, but it
> isn't clear that this is a good idea for you.  Do you really only want
> to pass around a single input vector for an entire map-reduce invocation?
> Map-reduce takes a looong time to get started.


Yeah, that's what I figured as well and wasn't sure whether for LWLS it
really makes sense to do map-reduce. But if you have huge amounts of data
and high-dimensional input, it probably would speed it up. Moreover, one
node acting as a mapper could process several input vectors, it would have
to read the prediction input vector from the configuration only once.

Anyway, I have it functional now passing the prediction input as String
via the configuration. I've used VectorWritable to get a byte[] and
encoded that with XStream's Base64Encoder, the other end decodes to byte[]
and reads that again with a VectorWritable. That would give one map-reduce
job per prediction input. But now that I read your reply it becomes clear
that the better solution for determining predictions for more than one
prediction input vector would indeed be reading those vectors from the
distributed cache or hdfs directly and thus formulate it as a single
map-reduce job. In that case I only have to make sure that the keys are
right.


> Also, I have written an implementation of LSMR for iterative linear
solution.
> Would that be helpful for you?

I don't think so, the final linear equations problem isn't sparse.


> I think that you may have mentioned that you were looking at LSQR
> some time ago.

Maybe you're mixing that up with my comment regarding the LWLR algorithm
where one in the end has to calculate theta = inv(A) * b. You said I
shouldn't do the inversion literally. I'm now using Colt's Algebra.solve
as t = Algebra.solve(A, b).

I think by the end of the week I can put a patch in Jira, it's probably
easier to discuss once there's already some code. There are a couple of
open questions. For instance, to get the weight one would use a kernel. As
it seems, so far nothing regarding kernels is implemented. For now I put a
Kernel interface and a GaussianKernel implementation in the LWLR package,
but there's probably a more appropriate place for this, since I guess that
other algorithms will make use of kernels as well. Moreover, I had to
enable reading/writing of matrices using sequence files, I think I will
make a separate patch for that.


Cheers,

Alex

Re: Sharing a vector between mappers

Reply via email to