Hi Jerry, Why don't you submit a pull request and then we can discuss there? If SimRank is not common enough, we might take the matrix multiplication method in and merge that. At the very least, even if SimRank doesn't get merged into Spark, we can include a contrib package or a Wiki page that links to examples of various algorithms community members have implemented.
On Thu, Jan 9, 2014 at 9:29 PM, Shao, Saisai <[email protected]> wrote: > Hi All, > > We would like to contribute SimRank algorithm to mllib. SimRank algorithm > used to calculate similarity rank between two objects based on graph > structure, details can be seen in ( > http://ilpubs.stanford.edu:8090/508/1/2001-41.pdf), here we implemented a > matrix multiplication method based on basic algorithm, the description of > matrix multiplication method can be seen in ( > http://www.cse.unsw.edu.au/~zhangw/files/wwwj.pdf) chapter 4.1. > > The implementation is abstracted and generalized from our customer's real > case, we made some tradeoffs to improve the speed and reduce the shuffle > size. we just wondered if this algorithm be suitable to put into mllib? > What else should we take care about? > > Any suggestion would be really appreciated. > > Thanks > Jerry >
