In your experience with using implicit factorization for document clustering, how did you tune alpha ? Using perplexity measures or just something simple like 1 + rating since the ratings are always positive in this case....
On Sun, Jul 26, 2015 at 1:23 AM, Sean Owen <so...@cloudera.com> wrote: > It sounds like you're describing the explicit case, or any matrix > decomposition. Are you sure that's best for count-like data? "It > depends," but my experience is that the implicit formulation is > better. In a way, the difference between 10,000 and 1,000 count is > less significant than the difference between 1 and 10. However if your > loss function penalizes the square of the error, then the former case > not only matters more for the same relative error, it matters 10x more > than the latter. It's very heavily skewed to pay attention to the > high-count instances. > > > On Sun, Jul 26, 2015 at 9:19 AM, Debasish Das <debasish.da...@gmail.com> > wrote: > > Yeah, I think the idea of confidence is a bit different than what I am > > looking for using implicit factorization to do document clustering. > > > > I basically need (r_ij - w_ih_j)^2 for all observed ratings and (0 - > > w_ih_j)^2 for all the unobserved ratings...Think about the document x > word > > matrix where r_ij is the count that's observed, 0 are the word counts > that > > are not in particular document. > > > > The broadcasted value of gram matrix w_i'wi or h_j'h_j will also count > the > > r_ij those are observed...So I might be fine using the broadcasted gram > > matrix and use the linear term as \sum (-r_ijw_i) or \sum (-rijh_j)... > > > > I will think further but in the current implicit formulation with > > confidence, looks like I am really factorizing a 0/1 matrix with weights > 1 + > > alpha*rating for . It's a bit different from LSA model. > > > > > > > > > > > > On Sun, Jul 26, 2015 at 12:34 AM, Sean Owen <so...@cloudera.com> wrote: > >> > >> confidence = 1 + alpha * |rating| here (so, c1 means confidence - 1), > >> so alpha = 1 doesn't specially mean high confidence. The loss function > >> is computed over the whole input matrix, including all missing "0" > >> entries. These have a minimal confidence of 1 according to this > >> formula. alpha controls how much more confident you are in what the > >> entries that do exist in the input mean. So alpha = 1 is low-ish and > >> means you don't think the existence of ratings means a lot more than > >> their absence. > >> > >> I think the explicit case is similar, but not identical -- here. The > >> cost function for the explicit case is not the same, which is the more > >> substantial difference between the two. There, ratings aren't inputs > >> to a confidence value that becomes a weight in the loss function, > >> during this factorization of a 0/1 matrix. Instead the rating matrix > >> is the thing being factorized directly. > >> > >> On Sun, Jul 26, 2015 at 6:45 AM, Debasish Das <debasish.da...@gmail.com > > > >> wrote: > >> > Hi, > >> > > >> > Implicit factorization is important for us since it drives > >> > recommendation > >> > when modeling user click/no-click and also topic modeling to handle 0 > >> > counts > >> > in document x word matrices through NMF and Sparse Coding. > >> > > >> > I am a bit confused on this code: > >> > > >> > val c1 = alpha * math.abs(rating) > >> > if (rating > 0) ls.add(srcFactor, (c1 + 1.0)/c1, c1) > >> > > >> > When the alpha = 1.0 (high confidence) and rating is > 0 (true for > word > >> > counts), why this formula does not become same as explicit formula: > >> > > >> > ls.add(srcFactor, rating, 1.0) > >> > > >> > For modeling document, I believe implicit Y'Y needs to stay but we > need > >> > explicit ls.add(srcFactor, rating, 1.0) > >> > > >> > I am understanding confidence code further. Please let me know if the > >> > idea > >> > of mapping implicit to handle 0 counts in document word matrix makes > >> > sense. > >> > > >> > Thanks. > >> > Deb > >> > > > > > >