In their example, docs were rows and words were columns. The terms of
the inner products they computed came from processing the posting
lists / columns instead of rows and emitting all pairs of docs
containing a word. Sounds like they just tossed the posting list for
common words. Anyway that's why I said cols and think that's right. At
least, that is what RowSimilartyJob is doing.

On Thu, Jul 14, 2011 at 10:05 PM, Ted Dunning <[email protected]> wrote:
> Rows.
>
> On Thu, Jul 14, 2011 at 12:24 PM, Sean Owen <[email protected]> wrote:
>
>> Just needs a rule for
>> tossing data -- you could simply throw away such columns (ouch), or at
>> least
>> use only a sampled subset of it.
>>
>

Reply via email to