Re: M/R capturing line numbers in text files

Shannon Quinn Tue, 15 Jun 2010 21:55:55 -0700

Hi Ted,

Thank you very much - very valuable insight as to a more robust inputformat. I've already started implementing it.

I finished the new M/R process to reflect the new assumed input format(submitted the patch), but I'm getting an exception I can't seem todiagnose. When I start the program, and the INFO lines start rollingfrom the process, right before the M/R task begins I get the following:

java.lang.ClassCastException: org.apache.hadoop.io.LongWritable cannotbe cast to org.apache.hadoop.io.IntWritableatorg.apache.mahout.clustering.eigencuts.EigencutsInputMapper.map(EigencutsInputMapper.java:22)


The line 22 referred to in the message is:

public class EigencutsInputMapper extends Mapper<IntWritable, Text,IntWritable, DistributedRowMatrix.MatrixEntryWritable> {

I did a search in all my source files; no mention anywhere (except onecommented-out line) of LongWritable. It was in my previousimplementation, but I performed mvn clean multiple times. Any thoughtswould be appreciated.


Thank you again!

Regards,
Shannon

On 6/15/2010 7:03 PM, Ted Dunning wrote:

Shannon,

Nice work so far.

I think it is a bit more customary to enter a graph by giving the integer
pairs that represent the starting and ending nodes for each arc.  That
avoids the memory allocation problem you hit if one node is connected to
millions of others.  It also may solve your problem of the distributed row
matrix since you could write a reducer to gather everything to the right
place for writing a row.  In doing that, you would inherently have the row
number available because that would be the grouping key.

If you keep the current one matrix row per csv line, I would recommend
putting the source node at the beginning of the line.


On Tue, Jun 15, 2010 at 3:58 PM, Shannon Quinn<[email protected]>  wrote:

1) I've made the assumption so far that the input to my clustering
algorithm will be a single CSV file containing the entire affinity matrix,
where each line in the file is a row in the matrix. Is there another input
approach that would work better for reading this affinity matrix?

Re: M/R capturing line numbers in text files

Reply via email to