Hi all, i had a chance to touch base quickly with Ted Dunning last weekend at the Bay Area machine learning camp. It's my understanding the main advantage of this method is that partial SVD can be achieved in a constant # of MR jobs (Ted's analysis seemed to imply that number would be 4) .
've been following Mahout for perhaps couple of months and read the book (first 6 chapters of it anyway) in MEA, and that's about it. But i have a great interest in all the work happening in this project. While it my be the case that our particular business problem at the time may be addressed by running single-node iterative svd (such as lanczos iterative, one of lapack's methods), it is highly likely it will not be the case for too long. We also use Hadoop and ecosystem for our platform, so mahout comes naturally into picture (whereas MPI does not). Anyway, starting the next week, i will have to spend time on that business need, and my boss seems to be happy if i have a chance to contribute part of my time and results to Mahout (i guess he also expects results as well... eventually :-) ) . The paper seems to be the one in the issue MAHOUT-309, i skimmed it a little bit and i guess i have some questions in regards to Ted's clarifications as given at the camp this weekend and this paper (if it is even the right one). I guess i do need some guidance if i am to do this and i am wondering if my effort is welcome (provided i need some guidance on some details of Mahout and the algorithms there). I guess my selfish desire is to escalate method availability in Mahout. Thank you very much. -Dmitriy