ok. Thanks. So here is what I understood.

Input data to Als.fit(implicitPrefs=True) is the actual strengths (count
data). So if I have a matrix of (user,item,views/purchases) I pass that as
the input and not the binarized one (preference). This signifies the
strength.

2) Since we also pass the alpha parameter to this Als.fit() method, Spark
internally creates the confidence matrix +1+alpha*input_data or some other
alpha factor.

3). The output which it gives is basically a factorization of 0/1 matrix
(binarized matrix from initial input data), hence the output also resembles
the preference matrix (0/1) suggesting the interaction. So typically it
should be between 0-1but if it is negative it means very less
preference/interaction

*Does all the above sound correct?.*

If yes, then one last question-

1). *For explicit dataset where we don't use implicitPref=True,* the
predicted ratings would be actual ratings like it can be 2.3,4.5 etc and
not the interaction measure. That is because in explicit we are not using
the confidence matrix and preference matrix concept and use the actual
rating data. So any output from Spark ALS for explicit data would be a
rating prediction.
ᐧ

On Thu, Dec 15, 2016 at 3:46 PM, Sean Owen <so...@cloudera.com> wrote:

> No, input are weights or strengths. The output is a factorization of the
> binarization of that to 0/1, not probs or a factorization of the input.
> This explains the range of the output.
>
>
> On Thu, Dec 15, 2016, 23:43 Manish Tripathi <tr.man...@gmail.com> wrote:
>
>> when you say *implicit ALS *is* factoring the 0/1 matrix. , are you
>> saying for implicit feedback algorithm we need to pass the input data as
>> the preference matrix i.e a matrix of 0 and 1?. *
>>
>> Then how will they calculate the confidence matrix which is basically
>> =1+alpha*count matrix. If we don't pass the actual count of values (views
>> etc) then how does Spark calculates the confidence matrix?.
>>
>> I was of the understanding that input data for als.fit(implicitPref=True)
>> is the actual count matrix of the views/purchases?. Am I going wrong here
>> if yes, then how is Spark calculating the confidence matrix if it doesn't
>> have the actual count data.
>>
>> The original paper on which Spark algo is based needs the actual count
>> data to create a confidence matrix and also needs the 0/1 matrix since the
>> objective functions uses both the confidence matrix and 0/1 matrix to find
>> the user and item factors.
>> ᐧ
>>
>> On Thu, Dec 15, 2016 at 3:38 PM, Sean Owen <so...@cloudera.com> wrote:
>>
>> No, you can't interpret the output as probabilities at all. In particular
>> they may be negative. It is not predicting rating but interaction. Negative
>> means very strongly not predicted to interact. No, implicit ALS *is*
>> factoring the 0/1 matrix.
>>
>> On Thu, Dec 15, 2016, 23:31 Manish Tripathi <tr.man...@gmail.com> wrote:
>>
>> Ok. So we can kind of interpret the output as probabilities even though
>> it is not modeling probabilities. This is to be able to use it for
>> binaryclassification evaluator.
>>
>> So the way I understand is and as per the algo, the predicted matrix is
>> basically a dot product of user factor and item factor matrix.
>>
>> but in what circumstances the ratings predicted can be negative. I can
>> understand if the individual user factor vector and item factor vector is
>> having negative factor terms, then it can be negative. But practically does
>> negative make any sense? AS per algorithm the dot product is the predicted
>> rating. So rating shouldnt be negative for it to make any sense. Also
>> rating just between 0-1 is normalised rating? Typically rating we expect to
>> be like any real value 2.3,4.5 etc.
>>
>> Also please note, for implicit feedback ALS, we don't feed 0/1 matrix. We
>> feed the count matrix (discrete count values) and am assuming spark
>> internally converts it into a preference matrix (1/0) and a confidence
>> matrix =1+alpha*count_matrix
>>
>>
>>
>>
>> ᐧ
>>
>> On Thu, Dec 15, 2016 at 2:56 PM, Sean Owen <so...@cloudera.com> wrote:
>>
>> No, ALS is not modeling probabilities. The outputs are reconstructions of
>> a 0/1 matrix. Most values will be in [0,1], but, it's possible to get
>> values outside that range.
>>
>> On Thu, Dec 15, 2016 at 10:21 PM Manish Tripathi <tr.man...@gmail.com>
>> wrote:
>>
>> Hi
>>
>> ran the ALS model for implicit feedback thing. Then I used the .transform
>> method of the model to predict the ratings for the original dataset. My
>> dataset is of the form (user,item,rating)
>>
>> I see something like below:
>>
>> predictions.show(5,truncate=False)
>>
>>
>> Why is the last prediction value negative ?. Isn't the transform method
>> giving the prediction(probability) of seeing the rating as 1?. I had counts
>> data for rating (implicit feedback) and for validation dataset I binarized
>> the rating (1 if >0 else 0). My training data has rating positive (it's
>> basically the count of views to a video).
>>
>> I used following to train:
>>
>> * als = ALS(rank=x, maxIter=15, regParam=y,
>> implicitPrefs=True,alpha=40.0)*
>>
>> *                model=als.fit(self.train)*
>>
>> What does negative prediction mean here and is it ok to have that?
>> ᐧ
>>
>>
>>
>>

Reply via email to