sirenbyte commented on code in PR #23577:
URL: https://github.com/apache/beam/pull/23577#discussion_r1116007037


##########
learning/tour-of-beam/learning-content/common-transforms/aggregation/count/description.md:
##########
@@ -0,0 +1,311 @@
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+http://www.apache.org/licenses/LICENSE-2.0
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+# Count
+
+`Count` provides many transformations for calculating the count of values in a 
`PCollection`, either globally or for each key.
+
+{{if (eq .Sdk "go")}}
+Counts the number of elements within each aggregation. The Count transform has 
two varieties:
+
+You can count the number of elements in ```PCollection``` with 
```CountElms()```, it will return one element.
+
+```
+import (
+    "github.com/apache/beam/sdks/go/pkg/beam"
+    "github.com/apache/beam/sdks/go/pkg/beam/transforms/stats"
+)
+
+func ApplyTransform(s beam.Scope, input beam.PCollection) beam.PCollection {
+    return stats.CountElms(s, input)
+}
+```
+
+You can use ```Count()``` to count how many elements are associated with a 
particular key, the result will be one output for each key.
+
+```
+import (
+    "github.com/apache/beam/sdks/go/pkg/beam"
+    "github.com/apache/beam/sdks/go/pkg/beam/transforms/stats"
+)
+
+func ApplyTransform(s beam.Scope, input beam.PCollection) beam.PCollection {
+    return stats.Count(s, input)
+}
+```
+{{end}}
+{{if (eq .Sdk "java")}}
+Counts the number of elements within each aggregation. The Count transform has 
three varieties:
+
+### Counting all elements in a PCollection
+
+```Count.globally()``` counts the number of elements in the entire 
PCollection. The result is a collection with a single element.
+
+```
+PCollection<Integer> numbers = pipeline.apply(Create.of(1, 2, 3, 4, 5, 6, 7, 
8, 9, 10));
+PCollection<Long> output = numbers.apply(Count.globally());
+```
+
+Output
+```
+10
+```
+
+### Counting elements for each key
+
+```Count.perKey()``` counts how many elements are associated with each key. It 
ignores the values. The resulting collection has one output for every key in 
the input collection.
+
+```
+PCollection<KV<String, Integer>> input = pipeline.apply(
+    Create.of(KV.of("πŸ₯•", 3),
+              KV.of("πŸ₯•", 2),
+              KV.of("πŸ†", 1),
+              KV.of("πŸ…", 4),
+              KV.of("πŸ…", 5),
+              KV.of("πŸ…", 3)));
+PCollection<KV<String, Long>> output = input.apply(Count.perKey());
+```
+
+Output
+
+```
+KV{πŸ₯•, 2}
+KV{πŸ…, 3}
+KV{πŸ†, 1}
+```
+
+### Counting all unique elements
+
+```Count.perElement()``` counts how many times each element appears in the 
input collection. The output collection is a key-value pair, containing each 
unique element and the number of times it appeared in the original collection.
+
+```
+PCollection<KV<String, Integer>> input = pipeline.apply(
+    Create.of(KV.of("πŸ₯•", 3),
+              KV.of("πŸ₯•", 2),
+              KV.of("πŸ†", 1),
+              KV.of("πŸ…", 3),
+              KV.of("πŸ…", 5),
+              KV.of("πŸ…", 3)));
+PCollection<KV<String, Long>> output = input.apply(Count.perElement());
+```
+
+Output
+
+```
+KV{KV{πŸ…, 3}, 2}
+KV{KV{πŸ₯•, 2}, 1}
+KV{KV{πŸ†, 1}, 1}
+KV{KV{πŸ₯•, 3}, 1}
+KV{KV{πŸ…, 5}, 1}
+```
+{{end}}
+{{if (eq .Sdk "python")}}
+### Counting all elements in a PCollection
+
+You can use ```Count.Globally()``` to count all elements in a PCollection, 
even if there are duplicate elements.
+
+```
+import apache_beam as beam
+
+with beam.Pipeline() as pipeline:
+  total_elements = (
+      pipeline
+      | 'Create plants' >> beam.Create(
+          ['πŸ“', 'πŸ₯•', 'πŸ₯•', 'πŸ₯•', 'πŸ†', 'πŸ†', 'πŸ…', 'πŸ…', 'πŸ…', '🌽'])
+      | 'Count all elements' >> beam.combiners.Count.Globally()
+      | beam.Map(print))
+```
+
+Output
+
+```
+10
+```
+
+### Counting elements for each key
+
+You can use ```Count.PerKey()``` to count the elements for each unique key in 
a PCollection of key-values.
+
+```
+import apache_beam as beam
+
+with beam.Pipeline() as pipeline:
+  total_elements_per_keys = (
+      pipeline
+      | 'Create plants' >> beam.Create([
+          ('spring', 'πŸ“'),
+          ('spring', 'πŸ₯•'),
+          ('summer', 'πŸ₯•'),
+          ('fall', 'πŸ₯•'),
+          ('spring', 'πŸ†'),
+          ('winter', 'πŸ†'),
+          ('spring', 'πŸ…'),
+          ('summer', 'πŸ…'),
+          ('fall', 'πŸ…'),
+          ('summer', '🌽'),
+      ])
+      | 'Count elements per key' >> beam.combiners.Count.PerKey()
+      | beam.Map(print))
+```
+
+Output
+
+```
+('spring', 4)
+('summer', 3)
+('fall', 2)
+('winter', 1)
+```
+
+### Counting all unique elements
+
+You can use ```Count.PerElement()``` to count only the unique elements in a 
PCollection.
+
+```
+import apache_beam as beam
+
+with beam.Pipeline() as pipeline:
+  total_unique_elements = (
+      pipeline
+      | 'Create produce' >> beam.Create(
+          ['πŸ“', 'πŸ₯•', 'πŸ₯•', 'πŸ₯•', 'πŸ†', 'πŸ†', 'πŸ…', 'πŸ…', 'πŸ…', '🌽'])
+      | 'Count unique elements' >> beam.combiners.Count.PerElement()
+      | beam.Map(print))
+```
+
+Output
+
+```
+('πŸ“', 1)
+('πŸ₯•', 3)
+('πŸ†', 2)
+('πŸ…', 3)
+('🌽', 1)
+```
+{{end}}
+### Playground exercise
+
+You can find the full code of this example in the playground window, which you 
can run and experiment with.
+{{if (eq .Sdk "python")}}
+`Count.globally` returns the number of integers from the `PCollection`. If you 
replace the `integers input` with this `map input` and replace 
`beam.combiners.Count.Globally` on `beam.combiners.Count.PerKey` it will output 
the count numbers by key :
+
+```
+beam.Create([
+    (1, 36),
+    (2, 91),
+    (3, 33),
+    (3, 11),
+    (4, 67),
+]) | beam.combiners.Count.PerKey()
+```
+
+And Count transforms work with strings too! Can you change the example to 
count the number of words in a given sentence and how often each word occurs?

Review Comment:
   Done



##########
learning/tour-of-beam/learning-content/common-transforms/filter/description.md:
##########
@@ -0,0 +1,421 @@
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+http://www.apache.org/licenses/LICENSE-2.0
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+### Using Filter
+
+PCollection datasets can be filtered using the Filter transform. You can 
create a filter by supplying a predicate and, when applied, filtering out all 
the elements of PCollection that don’t satisfy the predicate.
+
+{{if (eq .Sdk "go")}}
+```
+import (
+  "github.com/apache/fbeam/sdks/go/pkg/beam"
+  "github.com/apache/beam/sdks/go/pkg/beam/transforms/filter"
+)
+
+func ApplyTransform(s beam.Scope, input beam.PCollection) beam.PCollection {
+  return filter.Exclude(s, input, func(element int) bool {
+    return element % 2 == 1
+  })
+}
+```
+{{end}}
+{{if (eq .Sdk "java")}}
+```
+PCollection<String> allStrings = pipeline
+        .apply(Create.of(List.of("Hello","world","Hi")));
+
+PCollection<String> filteredStrings = allStrings
+        .apply(Filter.by(new SerializableFunction<String, Boolean>() {
+            @Override
+            public Boolean apply(String input) {
+                return input.length() > 3;
+            }
+        }));
+```
+
+Output
+
+```
+Hello
+world
+```
+
+### Built-in filters
+
+The Java SDK has several filter methods built-in, like Filter.greaterThan and 
Filter.lessThen.  With Filter.greaterThan, the input PCollection can be 
filtered so that only the elements whose values are greater than the specified 
amount remain. Similarly, you can use Filter.lessThen to filter out elements of 
the input PCollection whose values are greater than the specified amount.
+
+Other built-in filters are:
+
+* Filter.greaterThanEq
+* Filter.greaterThan
+* Filter.lessThan
+* Filter.lessThanEq
+* Filter.equal
+
+
+## Example 2: Filtering with a built-in methods
+
+```
+// List of integers
+PCollection<Integer> numbers = pipeline.apply(Create.of(List.of(1, 2, 3, 4, 5, 
6, 7, 8, 9, 10)));

Review Comment:
   Changed in all modules



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to