[ https://issues.apache.org/jira/browse/FLINK-2030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14728625#comment-14728625 ]
ASF GitHub Bot commented on FLINK-2030: --------------------------------------- Github user chiwanpark commented on a diff in the pull request: https://github.com/apache/flink/pull/861#discussion_r38619843 --- Diff: flink-java/src/main/java/org/apache/flink/api/java/utils/DataSetUtils.java --- @@ -248,6 +251,58 @@ public void mapPartition(Iterable<T> values, Collector<Tuple2<Long, T>> out) thr input.getType(), sampleInCoordinator, callLocation); } + /** + * Creates a {@link org.apache.flink.api.common.accumulators.DiscreteHistogram} from the data set + * + * @param data Discrete valued data set + * @return A histogram over data + */ + public static DataSet<DiscreteHistogram> createDiscreteHistogram(DataSet<Double> data) { + return data.mapPartition(new RichMapPartitionFunction<Double, DiscreteHistogram>() { + @Override + public void mapPartition(Iterable<Double> values, Collector<DiscreteHistogram> out) + throws Exception { + DiscreteHistogram histogram = new DiscreteHistogram(); + for (double value : values) { + histogram.add(value); + } + out.collect(histogram); + } + }).reduce(new ReduceFunction<DiscreteHistogram>() { + @Override + public DiscreteHistogram reduce(DiscreteHistogram value1, DiscreteHistogram value2) throws Exception { + value1.merge(value2); + return value1; + } + }); + } + + /** + * Creates a {@link org.apache.flink.api.common.accumulators.DiscreteHistogram} from the data set + * + * @param data Discrete valued data set + * @param bins Number of bins in the histogram + * @return A histogram over data + */ + public static DataSet<ContinuousHistogram> createContinuousHistogram(DataSet<Double> data, final int bins) { + return data.mapPartition(new RichMapPartitionFunction<Double, ContinuousHistogram>() { + @Override + public void mapPartition(Iterable<Double> values, Collector<ContinuousHistogram> out) + throws Exception { --- End diff -- Same here (unnecessary new line) > Implement an online histogram with Merging and equalization features > -------------------------------------------------------------------- > > Key: FLINK-2030 > URL: https://issues.apache.org/jira/browse/FLINK-2030 > Project: Flink > Issue Type: Sub-task > Components: Machine Learning Library > Reporter: Sachin Goel > Assignee: Sachin Goel > Priority: Minor > Labels: ML > > For the implementation of the decision tree in > https://issues.apache.org/jira/browse/FLINK-1727, we need to implement an > histogram with online updates, merging and equalization features. A reference > implementation is provided in [1] > [1].http://www.jmlr.org/papers/volume11/ben-haim10a/ben-haim10a.pdf -- This message was sent by Atlassian JIRA (v6.3.4#6332)