Re: Does RowSimilarity job support down-sampling

Sean Owen Tue, 18 Jun 2013 13:35:45 -0700

This is the "maxPrefsPerUser" option IIRC.


On Tue, Jun 18, 2013 at 9:27 PM, Ted Dunning <ted.dunn...@gmail.com> wrote:
> I was reading the RowSimilarityJob and it doesn't appear that it does
> down-sampling on the original data to minimize the performance impact of
> perversely prolific users.
>
> The issue is that if a single user has 100,000 items in their history, we
> learn nothing more than if we picked 300 of those while the former would
> result in processing 10 billion cooccurrences and the latter would result
> in 100,000.  This factor of 10,000 is so large that it can make a big
> difference in performance.
>
> I had thought that the code had this down-sampling in place.
>
> If not, I can add row based down-sampling quite easily.

Re: Does RowSimilarity job support down-sampling

Reply via email to