Hi Ted,
Thank you very much - very valuable insight as to a more robust input
format. I've already started implementing it.
I finished the new M/R process to reflect the new assumed input format
(submitted the patch), but I'm getting an exception I can't seem to
diagnose. When I start the program, and the INFO lines start rolling
from the process, right before the M/R task begins I get the following:
java.lang.ClassCastException: org.apache.hadoop.io.LongWritable cannot
be cast to org.apache.hadoop.io.IntWritable
at
org.apache.mahout.clustering.eigencuts.EigencutsInputMapper.map(EigencutsInputMapper.java:22)
The line 22 referred to in the message is:
public class EigencutsInputMapper extends Mapper<IntWritable, Text,
IntWritable, DistributedRowMatrix.MatrixEntryWritable> {
I did a search in all my source files; no mention anywhere (except one
commented-out line) of LongWritable. It was in my previous
implementation, but I performed mvn clean multiple times. Any thoughts
would be appreciated.
Thank you again!
Regards,
Shannon
On 6/15/2010 7:03 PM, Ted Dunning wrote:
Shannon,
Nice work so far.
I think it is a bit more customary to enter a graph by giving the integer
pairs that represent the starting and ending nodes for each arc. That
avoids the memory allocation problem you hit if one node is connected to
millions of others. It also may solve your problem of the distributed row
matrix since you could write a reducer to gather everything to the right
place for writing a row. In doing that, you would inherently have the row
number available because that would be the grouping key.
If you keep the current one matrix row per csv line, I would recommend
putting the source node at the beginning of the line.
On Tue, Jun 15, 2010 at 3:58 PM, Shannon Quinn<[email protected]> wrote:
1) I've made the assumption so far that the input to my clustering
algorithm will be a single CSV file containing the entire affinity matrix,
where each line in the file is a row in the matrix. Is there another input
approach that would work better for reading this affinity matrix?