Github user gaborhermann commented on the issue:
https://github.com/apache/flink/pull/2819
Hello Theodore,
Thanks for taking a look at this PR!
1. No problem. Some performance evaluations might help us in this case too.
I don't believe it should have much effect on the performance anyway as a 3-way
join is only used outside the iteration.
2. I really like the idea of doing a performance evaluation! I'm not
exactly sure how to do this with a `join` instead of a `coGroup`, so let me
sketch an implementation before elaborating on the pros/cons. (It was more
straightforward to use `coGroup`.)
3. The benefit of using more blocks is to use less memory, just as in the
ALS algorithm, with the disadvantage of using more network. However, more
blocks here means much more network and bookkeeping compared to ALS, because
there are more Flink iterations. I've investigated this a bit more and found
that the main "bottleneck" is the maximum size of the sort-buffer, which was
around 100 MB with around 40 GB TaskManager memory, and large matrix blocks do
not fit in. So even given enough memory, we cannot use large blocks.
Unfortunately we cannot configure the maximum size of the sort-buffer, and I
would not like to change the underlying code in the core or runtime, but I
tried a workaround which might work (splitting the factor blocks to subblocks).
I'll just need to run some measurements. If this workaround turns out fine,
then it should be okay to go on with this and give instructions in the docs.
4. Okay, I agree.
I hope we could generalize this non-overlapping blocking to other
optimization problems :) I'll take a look at the paper you linked!
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---