Repository: flink Updated Branches: refs/heads/master 273f54ba4 -> 9a2057310
[FLINK-3630] [docs] Little mistake in documentation This closes #2254 Project: http://git-wip-us.apache.org/repos/asf/flink/repo Commit: http://git-wip-us.apache.org/repos/asf/flink/commit/9a205731 Tree: http://git-wip-us.apache.org/repos/asf/flink/tree/9a205731 Diff: http://git-wip-us.apache.org/repos/asf/flink/diff/9a205731 Branch: refs/heads/master Commit: 9a2057310eb22bae2601f2635b201e3a168a0feb Parents: 273f54b Author: Greg Hogan <[email protected]> Authored: Thu Jul 14 14:42:55 2016 -0400 Committer: zentol <[email protected]> Committed: Fri Jul 15 11:55:21 2016 +0200 ---------------------------------------------------------------------- docs/apis/batch/dataset_transformations.md | 112 +++++++++++++++++++----- 1 file changed, 88 insertions(+), 24 deletions(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/flink/blob/9a205731/docs/apis/batch/dataset_transformations.md ---------------------------------------------------------------------- diff --git a/docs/apis/batch/dataset_transformations.md b/docs/apis/batch/dataset_transformations.md index 8e65389..9be9bc0 100644 --- a/docs/apis/batch/dataset_transformations.md +++ b/docs/apis/batch/dataset_transformations.md @@ -247,6 +247,13 @@ DataSet<Tuple1<String>> ds2 = ds.<Tuple1<String>>project(0).distinct(0); ~~~ </div> +<div data-lang="scala" markdown="1"> + +~~~scala +Not supported. +~~~ + +</div> <div data-lang="python" markdown="1"> ~~~python @@ -777,11 +784,15 @@ DataSet<Tuple2<String, Integer>> combinedWords = input .combineGroup(new GroupCombineFunction<String, Tuple2<String, Integer>() { public void combine(Iterable<String> words, Collector<Tuple2<String, Integer>>) { // combine + String key = null; int count = 0; + for (String word : words) { + key = word; count++; } - out.collect(new Tuple2(word, count)); + // emit tuple with word and count + out.collect(new Tuple2(key, count)); } }); @@ -790,11 +801,15 @@ DataSet<Tuple2<String, Integer>> output = combinedWords .reduceGroup(new GroupReduceFunction() { // group reduce with full data exchange public void reduce(Iterable<Tuple2<String, Integer>>, Collector<Tuple2<String, Integer>>) { + String key = null; int count = 0; + for (Tuple2<String, Integer> word : words) { + key = word; count++; } - out.collect(new Tuple2(word, count)); + // emit tuple with word and count + out.collect(new Tuple2(key, count)); } }); ~~~ @@ -809,27 +824,40 @@ val combinedWords: DataSet[(String, Int)] = input .groupBy(0) .combineGroup { (words, out: Collector[(String, Int)]) => + var key: String = null var count = 0 + for (word <- words) { - count++ + key = word + count += 1 } - out.collect(word, count) + out.collect((key, count)) } val output: DataSet[(String, Int)] = combinedWords .groupBy(0) .reduceGroup { (words, out: Collector[(String, Int)]) => - var count = 0 - for ((word, Int) <- words) { - count++ + var key: String = null + var sum = 0 + + for ((word, sum) <- words) { + key = word + sum += count } - out.collect(word, count) + out.collect((key, sum)) } ~~~ </div> +<div data-lang="python" markdown="1"> + +~~~python +Not supported. +~~~ + +</div> </div> The above alternative WordCount implementation demonstrates how the GroupCombine @@ -1418,10 +1446,35 @@ DataSet<Tuple2<String, Double>> ratings.join(weights) // [...] ~~~ +</div> +<div data-lang="scala" markdown="1"> + +~~~scala +case class Rating(name: String, category: String, points: Int) + +val ratings: DataSet[Ratings] = // [...] +val weights: DataSet[(String, Double)] = // [...] + +val weightedRatings = ratings.join(weights).where("category").equalTo(0) { + (rating, weight, out: Collector[(String, Double)]) => + if (weight._2 > 0.1) out.collect(rating.name, rating.points * weight._2) +} + +~~~ + +</div> +<div data-lang="python" markdown="1"> +Not supported. +</div> +</div> + #### Join with Projection (Java/Python Only) A Join transformation can construct result tuples using a projection as shown here: +<div class="codetabs" markdown="1"> +<div data-lang="java" markdown="1"> + ~~~java DataSet<Tuple3<Integer, Byte, String>> input1 = // [...] DataSet<Tuple2<Integer, Double>> input2 = // [...] @@ -1443,25 +1496,12 @@ The join projection works also for non-Tuple DataSets. In this case, `projectFir <div data-lang="scala" markdown="1"> ~~~scala -case class Rating(name: String, category: String, points: Int) - -val ratings: DataSet[Ratings] = // [...] -val weights: DataSet[(String, Double)] = // [...] - -val weightedRatings = ratings.join(weights).where("category").equalTo(0) { - (rating, weight, out: Collector[(String, Double)]) => - if (weight._2 > 0.1) out.collect(rating.name, rating.points * weight._2) -} - +Not supported. ~~~ </div> <div data-lang="python" markdown="1"> -#### Join with Projection (Java/Python Only) - -A Join transformation can construct result tuples using a projection as shown here: - ~~~python result = input1.join(input2).where(0).equal_to(0) \ .project_first(0,2).project_second(1).project_first(1); @@ -1709,6 +1749,23 @@ DataSet<Tuple2<String, Integer>> movies.leftOuterJoin(ratings) // [...] ~~~ +</div> +<div data-lang="scala" markdown="1"> + +~~~scala +Not supported. +~~~ + +</div> +<div data-lang="python" markdown="1"> + +~~~python +Not supported. +~~~ + +</div> +</div> + #### Join Algorithm Hints The Flink runtime can execute outer joins in various ways. Each possible way outperforms the others under @@ -2200,7 +2257,7 @@ DataSet<Tuple2<String, Integer>> in = // [...] // in descending order on the first String field. // Apply a MapPartition transformation on the sorted partitions. DataSet<Tuple2<String, String>> out = in.sortPartition(1, Order.ASCENDING) - .sortPartition(0, Order.DESCENDING) + .sortPartition(0, Order.DESCENDING) .mapPartition(new PartitionMapper()); ~~~ @@ -2213,11 +2270,18 @@ val in: DataSet[(String, Int)] = // [...] // in descending order on the first String field. // Apply a MapPartition transformation on the sorted partitions. val out = in.sortPartition(1, Order.ASCENDING) - .sortPartition(0, Order.DESCENDING) + .sortPartition(0, Order.DESCENDING) .mapPartition { ... } ~~~ </div> +<div data-lang="python" markdown="1"> + +~~~python +Not supported. +~~~ + +</div> </div> ### First-n
