Hi Richard,

You can also partition by a key like "user_id" so that all messages for a
given user would end up in the same partition.  This can be useful for
calculating user-specific aggregations or doing a distributed join where
the local state is also partitioned on user_id.

Cheers,

Roger

On Thu, Mar 26, 2015 at 9:28 AM, Richard Lee <rd...@tivo.com> wrote:

> Is there a typo below?  Are all of these actually in the same topic, just
> different partitions?  Partitioning, AFAIK, is mainly done for parallelism
> & throughput reasons.  What is the reason for partitioning your dataset by
> ‘columns’?
>
>
> https://cwiki.apache.org/confluence/display/KAFKA/FAQ#FAQ-HowdoIchoosethenumberofpartitionsforatopic
> ?
>
> Richard
>
> > On Mar 26, 2015, at 8:22 AM, Shekar Tippur <ctip...@gmail.com> wrote:
> >
> > Hello,
> >
> > Want to confirm a basic understanding of Kafka.
> > If I have a dataset that needs to be partitioned by 4 columns, then the
> > progression is
> >
> > {topic1:partition_key1} -> {Group by samza on partition_key1}
> > ->
> > {topic2:partition_key2} -> {Group by samza on partition_key2}
> > ->
> > {topic3:partition_key3} -> {Group by samza on partition_key3}
> > ->
> > {topic4:partition_key4} -> {Group by samza on partition_key4}
> >
> > Can you please confirm if my understanding is right?
> >
> > - Shekar
>
>
> ________________________________
>
> This email and any attachments may contain confidential and privileged
> material for the sole use of the intended recipient. Any review, copying,
> or distribution of this email (or any attachments) by others is prohibited.
> If you are not the intended recipient, please contact the sender
> immediately and permanently delete this email and any attachments. No
> employee or agent of TiVo Inc. is authorized to conclude any binding
> agreement on behalf of TiVo Inc. by email. Binding agreements with TiVo
> Inc. may only be made by a signed written agreement.
>

Reply via email to