+dev -user I don't understand your question. Can you give some more details?
--sebastian On 08.09.2011 13:51, 张玉东 wrote: > Ok, I understand this point, but in this step, the top similar items have > been chosen, then is it needed to select the top "maxSimilaritiesPerItem" > items in the job "mostSimilarItems" ? > > -----邮件原件----- > 发件人: Sebastian Schelter [mailto:[email protected]] > 发送时间: 2011年9月8日 19:42 > 收件人: [email protected] > 主题: Re: how to understand the parameter "maxSimilaritiesPerItem" > > The code snippet is invoked in a job that uses "Secondary Sort" which > means that the "entries" will be seen in descending order by the > reducer. That's why we only need to process the first ones. > > --sebastian > > On 08.09.2011 13:38, 张玉东 wrote: >> Hello, >> In the ItemSimilarityJob, the parameter "maxSimilaritiesPerItem" is firstly >> used in the 7th map/reduce job “asMatrix” as >> >> protected void reduce(SimilarityMatrixEntryKey key, >> Iterable<DistributedRowMatrix.MatrixEntryWritable> >> entries, >> Context ctx) throws IOException, >> InterruptedException { >> RandomAccessSparseVector temporaryVector = new >> RandomAccessSparseVector(Integer.MAX_VALUE, maxSimilaritiesPerRow); >> int similaritiesSet = 0; >> for (DistributedRowMatrix.MatrixEntryWritable entry : entries) { >> temporaryVector.setQuick(entry.getCol(), entry.getVal()); >> if (++similaritiesSet == maxSimilaritiesPerRow) { >> break; >> } >> } >> SequentialAccessSparseVector vector = new >> SequentialAccessSparseVector(temporaryVector); >> ctx.write(new IntWritable(key.getRow()), new VectorWritable(vector)); >> } >> >> I am confused that whether all the other items with similarity are written >> into the matrix for each item or not, if only part of them (not more than >> maxSimilaritiesPerItem) are written, then how to select them? Random? >> Thanks. >> >> yudong >> >> >
