Repository: samza Updated Branches: refs/heads/master 83ed46616 -> 1aee39ff1
SAMZA-766: fixed broken links in samza-container.html Project: http://git-wip-us.apache.org/repos/asf/samza/repo Commit: http://git-wip-us.apache.org/repos/asf/samza/commit/1aee39ff Tree: http://git-wip-us.apache.org/repos/asf/samza/tree/1aee39ff Diff: http://git-wip-us.apache.org/repos/asf/samza/diff/1aee39ff Branch: refs/heads/master Commit: 1aee39ff199743aaf150c3199cdbf65fb09e0dd0 Parents: 83ed466 Author: Aleksandar Pejakovic <[email protected]> Authored: Tue Sep 8 00:32:04 2015 -0700 Committer: Yan Fang <[email protected]> Committed: Tue Sep 8 00:32:04 2015 -0700 ---------------------------------------------------------------------- docs/learn/documentation/versioned/container/samza-container.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/samza/blob/1aee39ff/docs/learn/documentation/versioned/container/samza-container.md ---------------------------------------------------------------------- diff --git a/docs/learn/documentation/versioned/container/samza-container.md b/docs/learn/documentation/versioned/container/samza-container.md index f97e8a3..a7236a6 100644 --- a/docs/learn/documentation/versioned/container/samza-container.md +++ b/docs/learn/documentation/versioned/container/samza-container.md @@ -53,9 +53,9 @@ The number of partitions in the input streams is determined by the systems from If a Samza job has more than one input stream, the number of task instances for the Samza job is the maximum number of partitions across all input streams. For example, if a Samza job is reading from PageViewEvent (12 partitions), and ServiceMetricEvent (14 partitions), then the Samza job would have 14 task instances (numbered 0 through 13). Task instances 12 and 13 only receive events from ServiceMetricEvent, because there is no corresponding PageViewEvent partition. -With this default approach to assigning input streams to task instances, Samza is effectively performing a group-by operation on the input streams with their partitions as the key. Other strategies for grouping input stream partitions are possible by implementing a new [SystemStreamPartitionGrouper](../api/javadocs/org/apache/samza/container/SystemStreamPartitionGrouper.html) and factory, and configuring the job to use it via the job.systemstreampartition.grouper.factory configuration value. +With this default approach to assigning input streams to task instances, Samza is effectively performing a group-by operation on the input streams with their partitions as the key. Other strategies for grouping input stream partitions are possible by implementing a new [SystemStreamPartitionGrouper](../api/javadocs/org/apache/samza/container/grouper/stream/SystemStreamPartitionGrouper.html) and factory, and configuring the job to use it via the job.systemstreampartition.grouper.factory configuration value. -Samza provides the above-discussed per-partition grouper as well as the [GroupBySystemStreamPartitionGrouper](../api/javadocs/org/apache/samza/container/systemstreampartition/groupers/GroupBySystemStreamPartition), which provides a separate task class instance for every input stream partition, effectively grouping by the input stream itself. This provides maximum scalability in terms of how many containers can be used to process those input streams and is appropriate for very high volume jobs that need no grouping of the input streams. +Samza provides the above-discussed per-partition grouper as well as the GroupBySystemStreamPartitionGrouper, which provides a separate task class instance for every input stream partition, effectively grouping by the input stream itself. This provides maximum scalability in terms of how many containers can be used to process those input streams and is appropriate for very high volume jobs that need no grouping of the input streams. Considering the above example of a PageViewEvent partitioned 12 ways and a ServiceMetricEvent partitioned 14 ways, the GroupBySystemStreamPartitionGrouper would create 12 + 14 = 26 task instances, which would then be distributed across the number of containers configured, as discussed below.
