Repository: incubator-beam-site Updated Branches: refs/heads/asf-site ab1f700ca -> 976b0302a
minor: remove duplicate words Project: http://git-wip-us.apache.org/repos/asf/incubator-beam-site/repo Commit: http://git-wip-us.apache.org/repos/asf/incubator-beam-site/commit/6a5a0b3c Tree: http://git-wip-us.apache.org/repos/asf/incubator-beam-site/tree/6a5a0b3c Diff: http://git-wip-us.apache.org/repos/asf/incubator-beam-site/diff/6a5a0b3c Branch: refs/heads/asf-site Commit: 6a5a0b3cd77d89b81638bc4787bc635d0e10fda5 Parents: ab1f700 Author: terrencehan(é©äº®) <[email protected]> Authored: Wed Sep 28 17:56:30 2016 +0800 Committer: terrencehan(é©äº®) <[email protected]> Committed: Wed Sep 28 17:56:30 2016 +0800 ---------------------------------------------------------------------- learn/programming-guide.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/incubator-beam-site/blob/6a5a0b3c/learn/programming-guide.md ---------------------------------------------------------------------- diff --git a/learn/programming-guide.md b/learn/programming-guide.md index ac18ba6..a7e5f12 100644 --- a/learn/programming-guide.md +++ b/learn/programming-guide.md @@ -158,7 +158,7 @@ A `PCollection` is a large, immutable "bag" of elements. There is no upper limit A `PCollection` can be either **bounded** or **unbounded** in size. A **bounded** `PCollection` represents a data set of a known, fixed size, while an **unbounded** `PCollection` represents a data set of unlimited size. Whether a `PCollection` is bounded or unbounded depends on the source of the data set that it represents. Reading from a batch data source, such as a file or a database, creates a bounded `PCollection`. Reading from a streaming or continously-updating data source, such as Pub/Sub or Kafka, creates an unbounded `PCollection` (unless you explicitly tell it not to). -The bounded (or unbounded) nature The bounded (or unbounded) nature of your `PCollection` affects how Beam processes your data. A bounded `PCollection` can be processed using a batch job, which might read the entire data set once, and perform processing in a job of finite length. An unbounded `PCollection` must be processed using a streaming job that runs continuously, as the entire collection can never be available for processing at any one time. +The bounded (or unbounded) nature of your `PCollection` affects how Beam processes your data. A bounded `PCollection` can be processed using a batch job, which might read the entire data set once, and perform processing in a job of finite length. An unbounded `PCollection` must be processed using a streaming job that runs continuously, as the entire collection can never be available for processing at any one time. When performing an operation that groups elements in an unbounded `PCollection`, Beam requires a concept called **Windowing** to divide a continuously updating data set into logical windows of finite size. Beam processes each window as a bundle, and processing continues as the data set is generated. These logical windows are determined by some characteristic associated with a data element, such as a **timestamp**.
