Yeah that was it. I've added a CardinalityCorrectionMapper to MAHOUT-371 which does the job
On 16/07/10 16:40, Richard Simon Just wrote: > Ah, that makes sense. Dawns on me that Sean pointed to this earlier but > the implications hadn't clicked with me. Will write this now. Cheers > > > On 16/07/10 15:09, Jake Mannix wrote: > >> Richard, >> >> I think this is because the ToItemPrefMapper and ToUserVectorReducer is >> spitting out SequentialAccessSparseVectors with cardinality MAX_INTEGER, >> which doesn't matter in all of the Taste stuff, because no dense vector is >> ever created with this cardinality. For SVD, you do need dense vectors, so >> you really need to make sure the SASVectors have the correct cardinality. >> >> A simple M/R job which runs over the output of the ToXYZPrefMapper/Reducer >> sequence files, and spits out new Vectors with the correct cardinality but >> the >> same data should do the trick. It may require new constructors for these >> vectors however, to do it most efficiently (ie just copy references to >> inner >> data structures, but set a new value for the cardinality - you can't just >> modify >> it because it is probably final). >> >> -jake >> >> >
