Github user StephanEwen commented on the pull request:
https://github.com/apache/flink/pull/1058#issuecomment-135398622
Okay, looking at the "zipWithIndex" code, here is what really is the
problem:
Each function actually modifies the list, by sorting it. The here proposes
solution solves it, by making sure everyone has its own copy of the list. That,
btw, would have worked with any ArrayList as well. CopyOnWriteList seems a bit
overkill.
A nicer way to solve this is IMHO to use a broadcast variable initializer,
which would guarantee that the list is sorted once (by the first one that
accesses it) and then everyone shares the same sorted list.
- Less memory consumption (not super critical, as we are talking about
small lists)
- Less work, since only one sort happens per TaskManager, rather than one
sort per task.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---