Hi Richard, You can also partition by a key like "user_id" so that all messages for a given user would end up in the same partition. This can be useful for calculating user-specific aggregations or doing a distributed join where the local state is also partitioned on user_id.
Cheers, Roger On Thu, Mar 26, 2015 at 9:28 AM, Richard Lee <rd...@tivo.com> wrote: > Is there a typo below? Are all of these actually in the same topic, just > different partitions? Partitioning, AFAIK, is mainly done for parallelism > & throughput reasons. What is the reason for partitioning your dataset by > ‘columns’? > > > https://cwiki.apache.org/confluence/display/KAFKA/FAQ#FAQ-HowdoIchoosethenumberofpartitionsforatopic > ? > > Richard > > > On Mar 26, 2015, at 8:22 AM, Shekar Tippur <ctip...@gmail.com> wrote: > > > > Hello, > > > > Want to confirm a basic understanding of Kafka. > > If I have a dataset that needs to be partitioned by 4 columns, then the > > progression is > > > > {topic1:partition_key1} -> {Group by samza on partition_key1} > > -> > > {topic2:partition_key2} -> {Group by samza on partition_key2} > > -> > > {topic3:partition_key3} -> {Group by samza on partition_key3} > > -> > > {topic4:partition_key4} -> {Group by samza on partition_key4} > > > > Can you please confirm if my understanding is right? > > > > - Shekar > > > ________________________________ > > This email and any attachments may contain confidential and privileged > material for the sole use of the intended recipient. Any review, copying, > or distribution of this email (or any attachments) by others is prohibited. > If you are not the intended recipient, please contact the sender > immediately and permanently delete this email and any attachments. No > employee or agent of TiVo Inc. is authorized to conclude any binding > agreement on behalf of TiVo Inc. by email. Binding agreements with TiVo > Inc. may only be made by a signed written agreement. >