Github user gaborhermann commented on the issue:

    https://github.com/apache/flink/pull/2542
  
    Hi @thvasilo,
    
    Thanks for your thoughts! I agree we should perform a benchmark in the 
future. Furthermore, based on the results we could optimize the algorithm.
    
    I split up the test, and rebased to the current master. I checked the 
`java.Iterable` again, and commented at your original concern. I am afraid 
we'll have to use the `java.Iterable`.
    
    Regarding the expected results, I've only generated the small input data by 
hand. Before that I checked whether the Spark and Flink implementations 
converged to approximately same factor matrices (I only checked the value of 
the objective function, not the whole matrices). Because of the random 
initialization we cannot guarantee to have the same results, but there were 2-3 
points that both Spark and Flink converged to.
    
    There might be better methods for testing, but I considered this sufficient 
as the original `ALSITSuite` did nothing more. Of course, this test only checks 
whether the algorithm works the same way after some modifications (e.g. 
optimization), and does not check whether the algorithm initially works or not, 
but it's the same case with the original ALS. Do you know what is the assurance 
for the explicit ALS working good? (It must be good, as I also checked the 
results of the explicit ALS against Spark on toy-data.) AFAIK Spark generates 
random matrices of known rank, factorizes them, and checks whether the error is 
low (see their 
[ALSSuite](https://github.com/apache/spark/blob/master/mllib/src/test/scala/org/apache/spark/mllib/recommendation/ALSSuite.scala)).
 In the future, it might be worth to follow their approach.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

Reply via email to