Hi - 
I have a couple questions about partitioning - I’m trying to have multiple 
tasks instances run, each processing a separate partition, and it appears that 
only a single task instance runs, processing all partitions. Or else my 
partitions are not created properly. This is based on a modified version of 
hello-samza, so I’m not sure exactly which config+code steps to take to enable 
partitioning of message to multiple instances of the same task.

To route to a partition i use: messageCollector.send(new 
OutgoingMessageEnvelope(OUTPUT_STREAM, msgKey, partitionKey, outgoingMap));
- question here: the example in docs of collector.send(new 
OutgoingMessageEnvelope(new SystemStream("kafka", 
"SomeTopicPartitionedByUserId"), msg.get("user_id"), msg)) seems confusing, 
because the OutgoingMessageEnvelope constructor has a key AND partitionKey - I 
assume the partitionKey should be msg.get(“user_id”) in this case, but what 
should key be? Just a unique value for this message?

I tried the 3 parameter constructor as well and have similar problems, where my 
single task instance is used regardless of partitionKey specified in the 
OutgoingMessageEnvelope.

Do I need to specify partition manager and yarn.container.count to get multiple 
instances of my task working to service separate partitions?

I’m not sure how to tell if my messages are routed to the correct partition in 
kafka, or whether the problem is a partition handling config in samza.

Any advice is appreciated!

Thanks
Tyson

Reply via email to