Repository: beam-site
Updated Branches:
  refs/heads/asf-site ef362b546 -> 33b13882e


add CoGroupByKey to chapter 'Using GroupByKey'


Project: http://git-wip-us.apache.org/repos/asf/beam-site/repo
Commit: http://git-wip-us.apache.org/repos/asf/beam-site/commit/0910783b
Tree: http://git-wip-us.apache.org/repos/asf/beam-site/tree/0910783b
Diff: http://git-wip-us.apache.org/repos/asf/beam-site/diff/0910783b

Branch: refs/heads/asf-site
Commit: 0910783b69eff750c05b6c94271823d65adbaff3
Parents: ef362b5
Author: mingmxu <[email protected]>
Authored: Mon Apr 17 11:33:02 2017 -0700
Committer: Dan Halperin <[email protected]>
Committed: Mon Apr 17 11:33:02 2017 -0700

----------------------------------------------------------------------
 src/documentation/programming-guide.md | 29 ++++++++++++++++++++++++++++-
 1 file changed, 28 insertions(+), 1 deletion(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/beam-site/blob/0910783b/src/documentation/programming-guide.md
----------------------------------------------------------------------
diff --git a/src/documentation/programming-guide.md 
b/src/documentation/programming-guide.md
index d5453ba..4b9f960 100644
--- a/src/documentation/programming-guide.md
+++ b/src/documentation/programming-guide.md
@@ -465,8 +465,35 @@ tree, [2]
 
 Thus, `GroupByKey` represents a transform from a multimap (multiple keys to 
individual values) to a uni-map (unique keys to collections of values).
 
-> **A Note on Key/Value Pairs:** Beam represents key/value pairs slightly 
differently depending on the language and SDK you're using. In the Beam SDK for 
Java, you represent a key/value pair with an object of type `KV<K, V>`. In 
Python, you represent key/value pairs with 2-tuples.
+##### **Joins with CoGroupByKey**
+
+`CoGroupByKey` joins two or more key/value `PCollection`s that have the same 
key type, and then emits a collection of `KV<K, CoGbkResult>` pairs. [Design 
Your Pipeline]({{ site.baseurl 
}}/documentation/pipelines/design-your-pipeline/#multiple-sources) shows an 
example pipeline that uses a join. 
+
+Given the input collections below:
+```
+// collection 1
+user1, address1
+user2, address2
+user3, address3
 
+// collection 2
+user1, order1
+user1, order2
+user2, order3
+guest, order4
+...
+```
+
+`CoGroupByKey` gathers up the values with the same key from all 
`PCollection`s, and outputs a new pair consisting of the unique key and an 
object `CoGbkResult` containing all values that were associated with that key. 
If you apply `CoGroupByKey` to the input collections above, the output 
collection would look like this:
+```
+user1, [[address1], [order1, order2]]
+user2, [[address2], [order3]]
+user3, [[address3], []]
+guest, [[], [order4]]
+...
+````
+
+> **A Note on Key/Value Pairs:** Beam represents key/value pairs slightly 
differently depending on the language and SDK you're using. In the Beam SDK for 
Java, you represent a key/value pair with an object of type `KV<K, V>`. In 
Python, you represent key/value pairs with 2-tuples.
 
 #### <a name="transforms-combine"></a>Using Combine
 

Reply via email to