This is a good start. I think that there are some things that bug me in the implementation.
- assignColumn should work the same way that viewColumn does. - the machinery that finds the component matrix for a particular column should be separated out as a private method. - I think that the ColumnSizeCalculator class should go away. You don't need an extra object there, just a method. - I strongly suspect that you don't need to implement VectorSuperView. Won't the normal handling of viewRow in AbstractMatrix work here? Speed may be an issue, but all speed questions should be decided by measurements. - viewPart and like() are important. - set and get should not be implemented on top of viewRow. That will kill performance. - the public MatrixSuperView(int rowSize, int columnSize, Matrix[] matrices){ constructor makes no sense to me to expose to users. It should be inlined and go away. - the coding style in terms of white space is erratic. Your IDE should fix this. On Sun, Apr 14, 2013 at 4:35 AM, Gokhan Capan <gkhn...@gmail.com> wrote: > Ted, > > I wrote one yesterday. Basically it is a view implementing matrix, which > allows viewing and iterating on rows as if they are concatenated, via > VectorSuperView. > > Class naming can definitely change though. > > I'll change the LuceneMatrix code to return single matrix for multiple > fields (using this view), too. > > Could you have a look at this (only the matrix and vector views) so I > submit a diff (after handling labels), refactor and resubmit LuceneMatrix > patch, and then continue to work on Factorization Machines so it can > operate on a single matrix? > > The code is here (Adding exact locations for each related new class > because I did a kind of bad commit, from the top directory) > > > https://github.com/gcapan/mahout/blob/fm/math/src/main/java/org/apache/mahout/math/MatrixSuperView.java > > > https://github.com/gcapan/mahout/blob/fm/math/src/main/java/org/apache/mahout/math/VectorSuperView.java > > > https://github.com/gcapan/mahout/blob/fm/math/src/test/java/org/apache/mahout/math/MatrixSuperViewTest.java > > > https://github.com/gcapan/mahout/blob/fm/math/src/test/java/org/apache/mahout/math/VectorSuperViewTest.java > > > On Sat, Apr 13, 2013 at 10:05 AM, Ted Dunning <ted.dunn...@gmail.com>wrote: > >> What would this MatrixSuperView do? Would ConcatenatedMatrix be a better >> name? >> >> Sent from my iPhone >> >> On Apr 12, 2013, at 1:26, Gokhan Capan <gkhn...@gmail.com> wrote: >> >> > Ted, >> > >> > How about a MatrixSuperView implements Matrix? (A MatrixView like >> implementation) >> > >> > >> > On Fri, Apr 12, 2013 at 2:28 AM, Gokhan Capan <gkhn...@gmail.com> >> wrote: >> > So if I understood correctly, the algorithm still runs on matrix, and a >> client still can pass a group of matrices. >> > >> > Again it came to data preparation:) >> > >> > I will refactor the implementation to run on single matrix, but provide >> tools for turning the obvious client data into actual input to the >> algorithm. >> > >> > Sent from my iPhone >> > >> > On Apr 12, 2013, at 1:13, Ted Dunning <ted.dunn...@gmail.com> wrote: >> > >> >> One easy thing to do is to build an adjoined matrix type that does the >> concatenation on the fly. >> >> >> >> >> >> >> >> >> >> On Thu, Apr 11, 2013 at 1:43 PM, Gokhan Capan <gkhn...@gmail.com> >> wrote: >> >> Yeah, it is simpler indeed. >> >> >> >> I am going to think about alternative ways to make concatenation >> easier for clients. >> >> >> >> Thanks for your review >> >> >> >> >> >> On Thu, Apr 11, 2013 at 10:45 PM, Robin Anil <robin.a...@gmail.com> >> wrote: >> >> I would have folded them all as different feature ids in a single >> vector, makes things a lot simpler and faster. >> >> >> >> Robin Anil | Software Engineer | +1 312 869 2602 | Google Inc. >> >> >> >> >> >> On Thu, Apr 11, 2013 at 11:19 AM, Gokhan Capan <gkhn...@gmail.com> >> wrote: >> >> Hi Robin, >> >> >> >> If you are asking why they are arrays, it is because to save clients >> from concatenating multiple matrices to create the input. >> >> >> >> I am quoting from libFM paper: "For easier interpretation, >> >> the features are grouped into indicators for the active user (blue), >> active item (red), other movies rated >> >> by the same user (orange), the time in months (green), and the last >> movie rated (brown)." >> >> >> >> I thought a client would create multiple group of matrices, and he can >> just pass them all to the algorithm. >> >> >> >> Then the wModel is w parameters, it is still array of vectors for me >> to keep the indexing consistent, and vModel is the V parameters. >> >> >> >> Was that what you were asking? >> >> >> >> >> >> On Thu, Apr 11, 2013 at 6:44 PM, Robin Anil <robin.a...@gmail.com> >> wrote: >> >> Comments away. I was a bit confused by the use of Vector[] for w1 and >> Matrix[] for inputs. >> >> >> >> Robin Anil | Software Engineer | +1 312 869 2602 | Google Inc. >> >> >> >> >> >> On Thu, Apr 11, 2013 at 10:00 AM, Gokhan Capan <gkhn...@gmail.com> >> wrote: >> >> Ted, >> >> Robin, >> >> >> >> Although I did not test on a dataset yet, recently I've been >> implementing Factorization Machines with SGD optimization. >> >> >> >> The initial implementation is at >> https://github.com/gcapan/mahout/tree/fm >> >> >> >> Would you guys consider to take a look so I can make it better and >> running? >> >> >> >> >> >> >> >> On Mon, Apr 1, 2013 at 8:45 PM, Nkechi Nnadi <nkechi.nn...@gmail.com> >> wrote: >> >> Hello, >> >> >> >> I'm long time lurker. I would be interested in implementing these. I >> >> thought I would get my feet wet with contributing to wiki with >> tutorials >> >> since I have used Mahout for recommendation and clustering in my >> >> dissertation. I have never contributed code before and I would love to >> >> start now. >> >> >> >> -Nkechi >> >> >> >> >> >> On Sun, Mar 31, 2013 at 1:14 PM, Robin Anil <robin.a...@gmail.com> >> wrote: >> >> >> >> > FMs work really well for a whole range of things. Having implemented >> them >> >> > myself, I can extend my services as a reviewer if anyone is willing >> to >> >> > start on it. >> >> > >> >> > Robin Anil | Software Engineer | +1 312 869 2602 | Google Inc. >> >> > >> >> > >> >> > On Sun, Mar 31, 2013 at 2:18 AM, Ted Dunning <ted.dunn...@gmail.com> >> >> > wrote: >> >> > >> >> > > Relative to Dan's recent mention of SOM as possible new project, >> here are >> >> > > slides from KDD Cup 2012 in which Stephen Rendle describes how he >> did >> >> > using >> >> > > a very straightforward implementation of Factorization Machines >> [1,2]. >> >> > > >> >> > > >> >> > > FMs are interesting in the context of Mahout because they can be >> used in >> >> > a >> >> > > wide variety of settings including recommendation and targeting and >> >> > because >> >> > > they have very good performance on a number of tasks. >> >> > > >> >> > > I should mention that Robin was the one who first mentioned FMs to >> me. >> >> > > >> >> > > The KDD 2012 competition [3] is of interest in any case because it >> >> > provides >> >> > > a large amount of realistic data for commercially important >> problems. >> >> > > >> >> > > [1] >> >> > > >> >> > > >> >> > >> https://kaggle2.blob.core.windows.net/competitions/kddcup2012/2748/media/RendleSlides.pdf >> >> > > >> >> > > [2] >> >> > > >> >> > > >> >> > >> https://kaggle2.blob.core.windows.net/competitions/kddcup2012/2748/media/Rendle.pdf >> >> > > >> >> > > [3] http://www.kddcup2012.org/ >> >> > > >> >> > >> >> >> >> >> >> >> >> -- >> >> Gokhan >> >> >> >> >> >> >> >> >> >> -- >> >> Gokhan >> >> >> >> >> >> >> >> >> >> -- >> >> Gokhan >> >> >> > >> > >> > >> > -- >> > Gokhan >> > > > > -- > Gokhan >