Sorry, what does that mean :)?
what is a dotted vector? and why aren't they the same?
what should I investigate?
I am basically running my complete kmeans scenario (same input data, same
number of clusters param, etc.) but just replacing KmeansDriver.main step
with a DirichletDriver.main call...of course the arguments are adjusted
since kmeans and dirichlet do not have the same arguments.
I am not sure what number I should give for the alpha argument, iterations
and reductions...here is my current argument set:
args = new String[] {
"--input",
"/store/dev/inst/mahout-0.2/email-clustering/1-solr-vectors/solr_index.vec",
"--output", config.getClustersDir(),
"--modelClass",
"org.apache.mahout.clustering.dirichlet.models.NormalModelDistribution",
"--maxIter", "15",
"--alpha", "1.0",
"--k", config.getClustersCount(),
"--maxRed", "2"
};
anything suspicious in there?
On Wed, Jan 13, 2010 at 2:44 AM, Grant Ingersoll <[email protected]>wrote:
> I don't have the code in front of me, but if I had to guess based on the
> location of the stack trace, I'm going to guess it is b/c the sizes of the
> two vectors being "dotted" aren't the same.
>
> On Jan 12, 2010, at 6:46 PM, Bogdan Vatkov wrote:
>
> > what could be the reason for this Cardinality exception?
> >
> > 10/01/13 01:41:09 INFO clustering.SolrToMahoutDriver: Wrote: 174 vectors
> > 10/01/13 01:41:09 INFO clustering.SolrToMahoutDriver: Dictionary Output
> > file:
> > /store/dev/inst/mahout-0.2/email-clustering/1-solr-vectors/dictionary.txt
> > 10/01/13 01:41:11 INFO dirichlet.DirichletDriver: Iteration 0
> > 10/01/13 01:41:11 INFO jvm.JvmMetrics: Initializing JVM Metrics with
> > processName=JobTracker, sessionId=
> > 10/01/13 01:41:11 WARN mapred.JobClient: Use GenericOptionsParser for
> > parsing the arguments. Applications should implement Tool for the same.
> > 10/01/13 01:41:11 INFO mapred.FileInputFormat: Total input paths to
> process
> > : 1
> > 10/01/13 01:41:11 INFO mapred.JobClient: Running job: job_local_0001
> > 10/01/13 01:41:11 INFO mapred.FileInputFormat: Total input paths to
> process
> > : 1
> > 10/01/13 01:41:11 INFO compress.CodecPool: Got brand-new decompressor
> > 10/01/13 01:41:11 INFO mapred.MapTask: numReduceTasks: 1
> > 10/01/13 01:41:11 INFO mapred.MapTask: io.sort.mb = 100
> > 10/01/13 01:41:12 INFO mapred.MapTask: data buffer = 79691776/99614720
> > 10/01/13 01:41:12 INFO mapred.MapTask: record buffer = 262144/327680
> > 10/01/13 01:41:12 WARN mapred.LocalJobRunner: job_local_0001
> > org.apache.mahout.matrix.CardinalityException
> > at org.apache.mahout.matrix.AbstractVector.dot(AbstractVector.java:92)
> > at
> >
> org.apache.mahout.clustering.dirichlet.models.NormalModel.pdf(NormalModel.java:111)
> > at
> >
> org.apache.mahout.clustering.dirichlet.models.NormalModel.pdf(NormalModel.java:28)
> > at
> >
> org.apache.mahout.clustering.dirichlet.DirichletState.adjustedProbability(DirichletState.java:129)
> > at
> >
> org.apache.mahout.clustering.dirichlet.DirichletMapper.normalizedProbabilities(DirichletMapper.java:111)
> > at
> >
> org.apache.mahout.clustering.dirichlet.DirichletMapper.map(DirichletMapper.java:47)
> > at
> >
> org.apache.mahout.clustering.dirichlet.DirichletMapper.map(DirichletMapper.java:38)
> > at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
> > at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
> > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
> > at
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:176)
> > 10/01/13 01:41:12 INFO mapred.JobClient: map 0% reduce 0%
> > 10/01/13 01:41:12 INFO mapred.JobClient: Job complete: job_local_0001
> > 10/01/13 01:41:12 INFO mapred.JobClient: Counters: 0
> > 10/01/13 01:41:12 WARN dirichlet.DirichletDriver: java.io.IOException:
> Job
> > failed!
> > java.io.IOException: Job failed!
> > at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1252)
> > at
> >
> org.apache.mahout.clustering.dirichlet.DirichletDriver.runIteration(DirichletDriver.java:214)
> > at
> >
> org.apache.mahout.clustering.dirichlet.DirichletDriver.runJob(DirichletDriver.java:139)
> > at
> >
> org.apache.mahout.clustering.dirichlet.DirichletDriver.main(DirichletDriver.java:109)
> > at org.bogdan.clustering.mbeans.Clusters.doClustering(Clusters.java:244)
> > at org.bogdan.clustering.mbeans.Clusters.access$0(Clusters.java:175)
> > at org.bogdan.clustering.mbeans.Clusters$1.run(Clusters.java:148)
> > at java.lang.Thread.run(Thread.java:619)
>
>
--
Best regards,
Bogdan