Possibly related - I cannot seem to find the source for KafkaPartitionManager - 
can someone point me to it?

Thanks
Tyson

On Apr 11, 2014, at 12:15 PM, Tyson Norris <[email protected]> wrote:

> Hi - 
> I have a couple questions about partitioning - I’m trying to have multiple 
> tasks instances run, each processing a separate partition, and it appears 
> that only a single task instance runs, processing all partitions. Or else my 
> partitions are not created properly. This is based on a modified version of 
> hello-samza, so I’m not sure exactly which config+code steps to take to 
> enable partitioning of message to multiple instances of the same task.
> 
> To route to a partition i use: messageCollector.send(new 
> OutgoingMessageEnvelope(OUTPUT_STREAM, msgKey, partitionKey, outgoingMap));
> - question here: the example in docs of collector.send(new 
> OutgoingMessageEnvelope(new SystemStream("kafka", 
> "SomeTopicPartitionedByUserId"), msg.get("user_id"), msg)) seems confusing, 
> because the OutgoingMessageEnvelope constructor has a key AND partitionKey - 
> I assume the partitionKey should be msg.get(“user_id”) in this case, but what 
> should key be? Just a unique value for this message?
> 
> I tried the 3 parameter constructor as well and have similar problems, where 
> my single task instance is used regardless of partitionKey specified in the 
> OutgoingMessageEnvelope.
> 
> Do I need to specify partition manager and yarn.container.count to get 
> multiple instances of my task working to service separate partitions?
> 
> I’m not sure how to tell if my messages are routed to the correct partition 
> in kafka, or whether the problem is a partition handling config in samza.
> 
> Any advice is appreciated!
> 
> Thanks
> Tyson

Reply via email to