Github user jfeher commented on the issue:
https://github.com/apache/flink/pull/2542
Hi, we have measured the training time of als and ials with the given
dataset.
After filtering the data to unique item user pairs we got approximatly 64
million rankings.
We measured on a cluster with four nodes and on yarn. All of the nodes had
16 GB of memory.
The taskmanagers got 12 GB and the jobmanager got 2 GB.
We had four taskmanagers, one four each node.
After some testing it looked like a block number between 100 and 1500 is
the most optimal.
And between 100 and 300 the running times were steadily low.
**For ials we got the following measurments:**
The average time for block numbers between 100 and 1500 and 1 iteration in
seconds: 2000.33s
The average time for block numbers between 100 and 300 and 1 iteration in
seconds: 1729.44s
More detailed results by block sizes on the diagram:
http://imgur.com/LjJavti
**For als with the same configurations we got the following measurments:**
The average time for block numbers between 100 and 1500 and 1 iteration in
seconds: 1694.04s
The average time for block numbers between 100 and 300 and 1 iteration in
seconds: 1465.77s
So the ials version was 300 s slower on this data than the als.
When we increased the iteration number for 10 the time difference stayed
under 1000 s which is less than ten times 300.
This is because the fix time cost for the whole training is big.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---