Hi Sisir,
What operations do you want to do on a distributed matrix? We don't have
really any 3D operations which are scalable at this time (not sure if I know
of
any algorithms that require them), but the matrix (2D) and vector (1D)
operations we do currently support in a distributed way are all in the
DistributedRowMatrix class.
Some of the things we don't currently have in that class are ways to
mutate the matrix itself, but we certainly should have methods like:
public DistributedRowMatrix assign(UnaryFunction f) {
for(Vector row : this) { row = row.assign(f); }
}
public DistributedRowMatrix assign(Vector v, BinaryFunction f) {
for(Vector row : this) { row = row.assign(v, f); }
}
Which would at least allow you to update rows in a distributed
fashion.
What kind of operations would you want to do on a huge distributed
matrix?
-jake
On Sun, May 30, 2010 at 4:51 AM, Sisir Koppaka <[email protected]>wrote:
> Hi,
> I was looking for distributed map-reduce based 1D, 2D, and 3D operations on
> HDFS for the RBM algorithm. o.a.m.math.matrix has them but they are marked
> "@deprecated until unit tests are in place. Until this time, this
> class/interface is unsupported."
>
> Jake posted about o.a.m.math.hadoop.decomposer.DistributedLanczosSolver in
> Shannon's thread a few days ago - is there something like that for
> distributed map-reduce operations on HDFC for generic matrices? I need
> these
> operations because they don't fit in memory for large datasets.
>
> --
> Sisir
>