> Please don't call it V.  That is normally the name of the other matrix of
> singular vectors in SVD.  Calling a row normalized version of U by that
> name
> would be terminally confusing.


Excellent point. I'll call it W.

Some other questions:

1) In converting my code from using Matrix objects (Dense, Sparse) to
DistributedRowMatrix's, I've run into the problem of not being able to
perform some of the basic Matrix operations, such as zSum(), or to raise
elements to a power via UnaryFunctions, etc. I could certainly create
Map/Reduce jobs to do these tasks, but is this functionality that could be
included in DistributedRowMatrix itself?

2) For debugging purposes (since I'm using data sets small enough to be held
in memory), I set up a loop over my DistributedRowMatrix (after initializing
it and calling .configure() ):

for (MatrixSlice m : A) {
System.out.println(m.vector().zSum());
}

However, I received an exception on the line:

Exception in thread "main" java.lang.IllegalStateException:
java.io.IOException: wrong value class:
org.apache.mahout.math.vectorwrita...@6f649b44 is not class
org.apache.hadoop.io.Text

My raw data resides as a CSV file, on which I've run seqdirectory, and I'm
passing the path to the SequenceFiles to the DistributedRowMatrix
constructor. I've looked over the syntheticcontrol example and the "Creating
Vectors from Text" wiki page and am wondering if I'm missing something very
simple. In the syntheticcontrol example, should I simply be performing a job
like it does, converting the SequenceFile's from one format to another, and
then passing those to the DistributedRowMatrix constructor? Or is it
something else?

3) I noticed JobConf has been deprecated by Hadoop 0.20.2, but it's still
used by DistributedRowMatrix. I've been seeing all the tickets about
upgrading to the current Hadoop APIs, so I assume this is on the to-do list.
I'd be happy to help whomever is working on this particular item, or start
working on it if there isn't one.

Thank you again for you help!

Regards,
Shannon

Reply via email to