[ 
https://issues.apache.org/jira/browse/FLINK-2152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14716518#comment-14716518
 ] 

ASF GitHub Bot commented on FLINK-2152:
---------------------------------------

Github user StephanEwen commented on the pull request:

    https://github.com/apache/flink/pull/1058#issuecomment-135398622
  
    Okay, looking at the "zipWithIndex" code, here is what really is the 
problem:
    
    Each function actually modifies the list, by sorting it. The here proposes 
solution solves it, by making sure everyone has its own copy of the list. That, 
btw, would have worked with any ArrayList as well. CopyOnWriteList seems a bit 
overkill.
    
    A nicer way to solve this is IMHO to use a broadcast variable initializer, 
which would guarantee that the list is sorted once (by the first one that 
accesses it) and then everyone shares the same sorted list.
      - Less memory consumption (not super critical, as we are talking about 
small lists)
      - Less work, since only one sort happens per TaskManager, rather than one 
sort per task.


> Provide zipWithIndex utility in flink-contrib
> ---------------------------------------------
>
>                 Key: FLINK-2152
>                 URL: https://issues.apache.org/jira/browse/FLINK-2152
>             Project: Flink
>          Issue Type: Improvement
>          Components: Java API
>            Reporter: Robert Metzger
>            Assignee: Andra Lungu
>            Priority: Trivial
>              Labels: starter
>             Fix For: 0.10
>
>
> We should provide a simple utility method for zipping elements in a data set 
> with a dense index.
> its up for discussion whether we want it directly in the API or if we should 
> provide it only as a utility from {{flink-contrib}}.
> I would put it in {{flink-contrib}}.
> See my answer on SO: 
> http://stackoverflow.com/questions/30596556/zipwithindex-on-apache-flink



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to