This has come up before and can be a bit tricky to diagnose without looking through the code carefully.
The basic problem is that something has produced data that uses a long as an ID and your mapper is expecting an int. Have you posted your code as a patch on the jira or a git link? On Tue, Jun 15, 2010 at 9:55 PM, Shannon Quinn <[email protected]> wrote: > Hi Ted, > > Thank you very much - very valuable insight as to a more robust input > format. I've already started implementing it. > > I finished the new M/R process to reflect the new assumed input format > (submitted the patch), but I'm getting an exception I can't seem to > diagnose. When I start the program, and the INFO lines start rolling from > the process, right before the M/R task begins I get the following: > > java.lang.ClassCastException: org.apache.hadoop.io.LongWritable cannot be > cast to org.apache.hadoop.io.IntWritable > at > org.apache.mahout.clustering.eigencuts.EigencutsInputMapper.map(EigencutsInputMapper.java:22) > > The line 22 referred to in the message is: > > public class EigencutsInputMapper extends Mapper<IntWritable, Text, > IntWritable, DistributedRowMatrix.MatrixEntryWritable> { > > I did a search in all my source files; no mention anywhere (except one > commented-out line) of LongWritable. It was in my previous implementation, > but I performed mvn clean multiple times. Any thoughts would be appreciated. > > Thank you again! > > Regards, > Shannon > > > On 6/15/2010 7:03 PM, Ted Dunning wrote: > >> Shannon, >> >> Nice work so far. >> >> I think it is a bit more customary to enter a graph by giving the integer >> pairs that represent the starting and ending nodes for each arc. That >> avoids the memory allocation problem you hit if one node is connected to >> millions of others. It also may solve your problem of the distributed row >> matrix since you could write a reducer to gather everything to the right >> place for writing a row. In doing that, you would inherently have the row >> number available because that would be the grouping key. >> >> If you keep the current one matrix row per csv line, I would recommend >> putting the source node at the beginning of the line. >> >> >> On Tue, Jun 15, 2010 at 3:58 PM, Shannon Quinn<[email protected]> wrote: >> >> >> >>> 1) I've made the assumption so far that the input to my clustering >>> algorithm will be a single CSV file containing the entire affinity >>> matrix, >>> where each line in the file is a row in the matrix. Is there another >>> input >>> approach that would work better for reading this affinity matrix? >>> >>> >>> >>> >> >> > >
