Github user gaborhermann commented on the issue:
https://github.com/apache/flink/pull/2542
Hi @thvasilo,
Thanks for your thoughts! I agree we should perform a benchmark in the
future. Furthermore, based on the results we could optimize the algorithm.
I split up the test, and rebased to the current master. I checked the
`java.Iterable` again, and commented at your original concern. I am afraid
we'll have to use the `java.Iterable`.
Regarding the expected results, I've only generated the small input data by
hand. Before that I checked whether the Spark and Flink implementations
converged to approximately same factor matrices (I only checked the value of
the objective function, not the whole matrices). Because of the random
initialization we cannot guarantee to have the same results, but there were 2-3
points that both Spark and Flink converged to.
There might be better methods for testing, but I considered this sufficient
as the original `ALSITSuite` did nothing more. Of course, this test only checks
whether the algorithm works the same way after some modifications (e.g.
optimization), and does not check whether the algorithm initially works or not,
but it's the same case with the original ALS. Do you know what is the assurance
for the explicit ALS working good? (It must be good, as I also checked the
results of the explicit ALS against Spark on toy-data.) AFAIK Spark generates
random matrices of known rank, factorizes them, and checks whether the error is
low (see their
[ALSSuite](https://github.com/apache/spark/blob/master/mllib/src/test/scala/org/apache/spark/mllib/recommendation/ALSSuite.scala)).
In the future, it might be worth to follow their approach.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---