What's a row here, a user? I completely agree but then this describes how
you start item-item simiarity computation, where items are columns right?
The job here is turned on its side, computing row similarity.
On Jul 14, 2011 11:21 PM, "Ted Dunning" <[email protected]> wrote:
> The problem arises when the program is reading a single row and emitting
all
> of the cooccurring items. The number of items emitted is the square of the
> number of items in a row. Thus, it is more dense rows that cause the
> problem.
>
> On Thu, Jul 14, 2011 at 2:25 PM, Sean Owen <[email protected]> wrote:
>
>> In their example, docs were rows and words were columns. The terms of
>> the inner products they computed came from processing the posting
>> lists / columns instead of rows and emitting all pairs of docs
>> containing a word. Sounds like they just tossed the posting list for
>> common words. Anyway that's why I said cols and think that's right. At
>> least, that is what RowSimilartyJob is doing.
>>
>> On Thu, Jul 14, 2011 at 10:05 PM, Ted Dunning <[email protected]>
>> wrote:
>> > Rows.
>> >
>> > On Thu, Jul 14, 2011 at 12:24 PM, Sean Owen <[email protected]> wrote:
>> >
>> >> Just needs a rule for
>> >> tossing data -- you could simply throw away such columns (ouch), or at
>> >> least
>> >> use only a sampled subset of it.
>> >>
>> >
>>

Reply via email to