Re: Sharing a vector between mappers

Ted Dunning Thu, 21 Oct 2010 06:45:16 -0700

On Thu, Oct 21, 2010 at 12:28 AM, Alexander Hans <a...@ahans.de> wrote:


> But now that I read your reply it becomes clear
> that the better solution for determining predictions for more than one
> prediction input vector would indeed be reading those vectors from the
> distributed cache or hdfs directly and thus formulate it as a single
> map-reduce job. In that case I only have to make sure that the keys are
> right.
>

That still sounds kind of upside-down to me.

The issue with this kind of program is that there is usually a part of the
data that is bounded in size
(the model) and a part of the data that is unbounded in size (the input
data).  The bounded portion is
usually what is stored in distributed cache, even if not all mappers read
all of that data.

The unbounded part is normally then parceled out to the different data
nodes.

Why is your program the opposite?  Need it be?  Should it be?



>  > Also, I have written an implementation of LSMR for iterative
> linear solution.
> > Would that be helpful for you?
>
> I don't think so, the final linear equations problem isn't sparse.
>

Cool.


>  > I think that you may have mentioned that you were looking at LSQR
> > some time ago.
>
> Maybe you're mixing that up with my comment regarding the LWLR algorithm
> where one in the end has to calculate theta = inv(A) * b. You said I
> shouldn't do the inversion literally. I'm now using Colt's Algebra.solve
> as t = Algebra.solve(A, b).
>

OK.  We should promote that code to full tested status.  I started in on the
LUD
code a while ago (github has the current state) but didn't finish.

Is that code using LU decomposition or QR to do the solving?


> I think by the end of the week I can put a patch in Jira, it's probably
> easier to discuss once there's already some code.
>

great.


> There are a couple of
> open questions. For instance, to get the weight one would use a kernel. As
> it seems, so far nothing regarding kernels is implemented. For now I put a
> Kernel interface and a GaussianKernel implementation in the LWLR package,
> but there's probably a more appropriate place for this, since I guess that
> other algorithms will make use of kernels as well.
>

That sounds useful, but for now the LWLR package is a fine place for that.


> Moreover, I had to
> enable reading/writing of matrices using sequence files, I think I will
> make a separate patch for that.
>

Isn't there a MatrixWritable class for this?

Re: Sharing a vector between mappers

Reply via email to