From: [email protected]
To: [email protected]
Subject: RE: JIRA issues 1248/1249
Date: Wed, 8 Jan 2014 20:17:04 -0800
I read through 1249 and had some initial questions before coming up with a
plan, I was looking through the ParallelALSFactorizationJob.java and am
assuming this is the right place to make all the changes, to this end:
1) I was thinking of introducing convergence training error as another
parameter to be specified as a configuration parameter to replace the number of
iterations2) For the chunk of code below:
for (int currentIteration = 0; currentIteration < numIterations; currentIteration++) { /* broadcast M, read A
row-wise, recompute U row-wise */ log.info("Recomputing U (iteration {}/{})", currentIteration,
numIterations); runSolver(pathToUserRatings(), pathToU(currentIteration), pathToM(currentIteration - 1),
currentIteration, "U", numItems); /* broadcast U, read A' row-wise, recompute M row-wise */
log.info("Recomputing M (iteration {}/{})", currentIteration, numIterations);
runSolver(pathToItemRatings(), pathToM(currentIteration), pathToU(currentIteration), currentIteration, "M",
numUsers); }
I am proposing we have a while loop similar to the following:
while (currentTrainingError<=specifiedTrainingErrorForConvergence) { /* broadcast M, read A row-wise, recompute U
row-wise */ log.info("Recomputing U (iteration {}/{})", currentIteration, numIterations);
runSolver(pathToUserRatings(), pathToU(currentIteration), pathToM(currentIteration - 1), currentIteration, "U",
numItems); /* broadcast U, read A' row-wise, recompute M row-wise */ log.info("Recomputing M
(iteration {}/{})", currentIteration, numIterations); runSolver(pathToItemRatings(), pathToM(currentIteration),
pathToU(currentIteration), currentIteration, "M", numUsers);}
However I am wondering where or how I would compute the training error each
time, would that happen inside runSolver or be an artifact of performing the
solverComputation, pardon my ignorance on this, also I wanted to get deeper
insight into ALS, is the following the best paper to read:
http://www.hpl.hp.com/personal/Robert_Schreiber/papers/2008%20AAIM%20Netflix/netflix_aaim08(submitted).pdf.
Specifically I am trying to understand where the training error comes into play
within the SVD computation.
Really would appreciate some more insight as I explore and dig through the code.
Regards
Date: Tue, 7 Jan 2014 09:11:17 +0100
From: [email protected]
To: [email protected]
Subject: Re: JIRA issues 1248/1249
Hi Saikat,
I suggest to start with 1249, which is the easier task. The best way to
proceed is by discussing on the mailinglist. Have a look at the issue,
propose a solution here and wait for our feedback.
Best,
Sebastian
On 07.01.2014 04:27, Saikat Kanjilal wrote:
Sebastien et al,After months of not having bandwidth to help out with coding
tasks I am finally ready to help with the implementation of the above JIRA
issues, before I begin I wanted to make sure these improvements are still
needed for ALS, I am targeting to finish these by the 1.0 release. Also if
these are relevant should I just present a design/plan of implementation? I'd
love some initial guidance and thoughts around these tasks, feel free to add
them to the tickets themselves.Thanks in advance.