Re: how to understand the parameter "maxSimilaritiesPerItem"

Sebastian Schelter Thu, 08 Sep 2011 05:01:25 -0700

+dev
-user

I don't understand your question. Can you give some more details?


--sebastian

On 08.09.2011 13:51, 张玉东 wrote:
> Ok, I understand this point, but in this step, the top similar items have 
> been chosen, then is it needed to select the top "maxSimilaritiesPerItem" 
> items in the job "mostSimilarItems" ?
> 
> -----邮件原件-----
> 发件人: Sebastian Schelter [mailto:[email protected]] 
> 发送时间: 2011年9月8日 19:42
> 收件人: [email protected]
> 主题: Re: how to understand the parameter "maxSimilaritiesPerItem"
> 
> The code snippet is invoked in a job that uses "Secondary Sort" which
> means that the "entries" will be seen in descending order by the
> reducer. That's why we only need to process the first ones.
> 
> --sebastian
> 
> On 08.09.2011 13:38, 张玉东 wrote:
>> Hello,
>> In the ItemSimilarityJob, the parameter "maxSimilaritiesPerItem" is firstly 
>> used in the 7th map/reduce job “asMatrix” as
>>
>>     protected void reduce(SimilarityMatrixEntryKey key,
>>                           Iterable<DistributedRowMatrix.MatrixEntryWritable> 
>> entries,
>>                           Context ctx) throws IOException, 
>> InterruptedException {
>>       RandomAccessSparseVector temporaryVector = new 
>> RandomAccessSparseVector(Integer.MAX_VALUE, maxSimilaritiesPerRow);
>>       int similaritiesSet = 0;
>>       for (DistributedRowMatrix.MatrixEntryWritable entry : entries) {
>>         temporaryVector.setQuick(entry.getCol(), entry.getVal());
>>         if (++similaritiesSet == maxSimilaritiesPerRow) {
>>           break;
>>         }
>>       }
>>       SequentialAccessSparseVector vector = new 
>> SequentialAccessSparseVector(temporaryVector);
>>       ctx.write(new IntWritable(key.getRow()), new VectorWritable(vector));
>>     }
>>
>> I am confused that whether all the other items with similarity are written 
>> into the matrix for each item or not, if only part of them (not more than 
>> maxSimilaritiesPerItem) are written, then how to select them? Random?
>> Thanks.
>>
>> yudong
>>
>>
>

Re: how to understand the parameter "maxSimilaritiesPerItem"

Reply via email to