[ 
https://issues.apache.org/jira/browse/FLINK-2152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14585871#comment-14585871
 ] 

ASF GitHub Bot commented on FLINK-2152:
---------------------------------------

Github user tillrohrmann commented on a diff in the pull request:

    https://github.com/apache/flink/pull/832#discussion_r32415490
  
    --- Diff: 
flink-java/src/main/java/org/apache/flink/api/java/utils/DataSetUtils.java ---
    @@ -0,0 +1,106 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.flink.api.java.utils;
    +
    +import org.apache.flink.api.common.functions.RichMapPartitionFunction;
    +import org.apache.flink.api.java.DataSet;
    +import org.apache.flink.api.java.tuple.Tuple2;
    +import org.apache.flink.configuration.Configuration;
    +import org.apache.flink.util.Collector;
    +
    +import java.util.Collections;
    +import java.util.Comparator;
    +import java.util.List;
    +
    +/**
    + * This class provides simple utility methods for zipping elements in a 
file with an index.
    + *
    + * @param <T> The type of the DataSet, i.e., the type of the elements of 
the DataSet.
    + */
    +public class DataSetUtils<T> {
    +
    +   /**
    +    * Method that goes over all the elements in each partition in order to 
retireve
    +    * the total number of elements.
    +    *
    +    * @param input the DataSet received as input
    +    * @return a data set containing tuples of subtask index, number of 
elements mappings.
    +    */
    +   public DataSet<Tuple2<Integer, Long>> countElements(DataSet<T> input) {
    +           return input.mapPartition(new RichMapPartitionFunction<T, 
Tuple2<Integer,Long>>() {
    +                   @Override
    +                   public void mapPartition(Iterable<T> values, 
Collector<Tuple2<Integer, Long>> out) throws Exception {
    +                           long counter = 0;
    +                           for(T value: values) {
    +                                   counter ++;
    +                           }
    +
    +                           out.collect(new Tuple2<Integer, 
Long>(getRuntimeContext().getIndexOfThisSubtask(), counter));
    +                   }
    +           });
    +   }
    +
    +   /**
    +    * Method that takes a set of subtask index, total number of elements 
mappings
    +    * and assigns ids to all the elements from the input data set.
    +    *
    +    * @param input the input data set
    +    * @return a data set of tuple 2 consisting of consecutive ids and 
initial values.
    +    */
    +   public DataSet<Tuple2<Long, T>> zipWithIndex(DataSet<T> input) {
    --- End diff --
    
    But you still would have to import the implicit cast for the pimp my 
library class to make it usable. Thus it is only syntactic sugar for writing 
something like `DataSetUtils.zipWithIndex(ds)`.


> Provide zipWithIndex utility in flink-contrib
> ---------------------------------------------
>
>                 Key: FLINK-2152
>                 URL: https://issues.apache.org/jira/browse/FLINK-2152
>             Project: Flink
>          Issue Type: Improvement
>          Components: Java API
>            Reporter: Robert Metzger
>            Assignee: Andra Lungu
>            Priority: Trivial
>              Labels: starter
>
> We should provide a simple utility method for zipping elements in a data set 
> with a dense index.
> its up for discussion whether we want it directly in the API or if we should 
> provide it only as a utility from {{flink-contrib}}.
> I would put it in {{flink-contrib}}.
> See my answer on SO: 
> http://stackoverflow.com/questions/30596556/zipwithindex-on-apache-flink



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to