[
https://issues.apache.org/jira/browse/FLINK-2152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14716518#comment-14716518
]
ASF GitHub Bot commented on FLINK-2152:
---------------------------------------
Github user StephanEwen commented on the pull request:
https://github.com/apache/flink/pull/1058#issuecomment-135398622
Okay, looking at the "zipWithIndex" code, here is what really is the
problem:
Each function actually modifies the list, by sorting it. The here proposes
solution solves it, by making sure everyone has its own copy of the list. That,
btw, would have worked with any ArrayList as well. CopyOnWriteList seems a bit
overkill.
A nicer way to solve this is IMHO to use a broadcast variable initializer,
which would guarantee that the list is sorted once (by the first one that
accesses it) and then everyone shares the same sorted list.
- Less memory consumption (not super critical, as we are talking about
small lists)
- Less work, since only one sort happens per TaskManager, rather than one
sort per task.
> Provide zipWithIndex utility in flink-contrib
> ---------------------------------------------
>
> Key: FLINK-2152
> URL: https://issues.apache.org/jira/browse/FLINK-2152
> Project: Flink
> Issue Type: Improvement
> Components: Java API
> Reporter: Robert Metzger
> Assignee: Andra Lungu
> Priority: Trivial
> Labels: starter
> Fix For: 0.10
>
>
> We should provide a simple utility method for zipping elements in a data set
> with a dense index.
> its up for discussion whether we want it directly in the API or if we should
> provide it only as a utility from {{flink-contrib}}.
> I would put it in {{flink-contrib}}.
> See my answer on SO:
> http://stackoverflow.com/questions/30596556/zipwithindex-on-apache-flink
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)