Hey Anh, Kafka topics will automatically be created when they're read from or written to, but they'll be created with the default partition count. If you want a custom partition count (and it sounds like you do), you'll need to run the scripts to create the topics beforehand, as you described.
Cheers, Chris On 2/28/14 1:51 AM, "Anh Thu Vu" <[email protected]> wrote: >Thank you, Chris! > >So if I want to have a list of streams/topics with different number of >partitions, I should write a script to create all the topics before >starting my samza jobs? There is no way to set that "automatically" from >my >samza jobs? > >Casey > > >On Fri, Feb 28, 2014 at 12:45 AM, Chris Riccomini ><[email protected]>wrote: > >> Hey Casey, >> >> Garry is right. By default Kafka ships with 2 partitions per topic. This >> can be configured exactly as Garry describes. >> >> To create a new topic with a custom partition count, use: >> >> bin/kafka-create-topic.sh --zookeeper localhost:2181 --replica 1 >> --partition 1 --topic test >> >> See http://kafka.apache.org/08/quickstart.html for details. >> >> To expand a topic's partition count for a topic that already exists, >>run: >> >> bin/kafka-add-partitions.sh >> >> See https://cwiki.apache.org/confluence/display/KAFKA/Replication+tools >> for details. >> >> You can't shrink a topic's partition count for a topic that already >>exists. >> >> You can send a message to all partitions manually, as you describe. >> Something like: >> >> For(int partition: partitions) { >> collector.send(OUTGOING_STREAM, partition, null, outgoingMap) >> } >> >> This should result in the message being sent to each partition. If you >> have a kafka-console-consumer.sh running against the stream, you should >> see the message once for each partition. >> >> Cheers, >> Chris >> >> On 2/27/14 9:21 AM, "Anh Thu Vu" <[email protected]> wrote: >> >> >Hi Garry, >> > >> >I really owe you a lot today :) >> > >> >About the broadcasting, I think both cases will serve my purpose. >> >Basically, for some particular streams, I want to broadcast every >>messages >> >to all the replicas of the recipient StreamTask. >> > >> >Casey >> > >> > >> >On Thu, Feb 27, 2014 at 6:00 PM, Garry Turkington < >> >[email protected]> wrote: >> > >> >> Hi Casey, >> >> >> >> On the first question I *believe* you see 2 partitions on the created >> >> output streams because they are created automatically by Kafka >> >> (auto.create.topics.enabled set to true) and if so then it seems to >> >>default >> >> to 2 and not 1 partitions. >> >> >> >> For the rest I am completely out of my depth and will wait for one of >> >>the >> >> other guys to jump in. :) >> >> >> >> On the second question what are you trying to achieve here? Do you >>want >> >> the mechanism for a single message to go to all partitions of a >>stream >> >>or >> >> do you want every client to read every message across all the >> >>partitions? >> >> The two are different though I'm not sure I've made that clear. I ask >> >> because there was a discussion earlier this week about a possible >> >>addition >> >> of broadcast streams which are more in the second case. >> >> >> >> Regards >> >> Garry >> >> >> >> -----Original Message----- >> >> From: Anh Thu Vu [mailto:[email protected]] >> >> Sent: 27 February 2014 15:14 >> >> To: [email protected] >> >> Subject: Stream partition >> >> >> >> Hi all, >> >> >> >> It's me again :) >> >> >> >> I have some questions regarding partitioning the stream. >> >> Consider the wikipedia feed task in hello-samza, when I run it, I >>saw 2 >> >> partitions when I list kafka topics: >> >> topic: wikipedia-raw partition: 0 leader: 0 replicas: 0 >> >>isr: 0 >> >> topic: wikipedia-raw partition: 1 leader: 0 replicas: 0 >> >>isr: 0 >> >> >> >> My first question is how to increase the number of partitions? I was >> >> playing around with the OutgoingMessageEnvelope (OUTPUT_STREAM is the >> >> kafka.wikipedia-raw >> >> SystemStream): >> >> 1) >> >> collector.send(new OutgoingMessageEnvelope(OUTPUT_STREAM, >>outgoingMap)); >> >> >> >> 2) >> >> collector.send(new OutgoingMessageEnvelope(new >> >> SystemStreamPartition(OUTPUT_STREAM, new Partition(0)), >>outgoingMap)); >> >> collector.send(new OutgoingMessageEnvelope(new >> >> SystemStreamPartition(OUTPUT_STREAM, new Partition(1)), >>outgoingMap)); >> >> collector.send(new OutgoingMessageEnvelope(new >> >> SystemStreamPartition(OUTPUT_STREAM, new Partition(2)), >>outgoingMap)); >> >> >> >> 3) >> >> collector.send(new OutgoingMessageEnvelope(OUTPUT_STREAM,0, null, >> >> outgoingMap)); collector.send(new >> >>OutgoingMessageEnvelope(OUTPUT_STREAM,1, >> >> null, outgoingMap)); collector.send(new >> >> OutgoingMessageEnvelope(OUTPUT_STREAM,2, null, outgoingMap)); >> >> >> >> but in all cases above, I always got 2 partitions (and why 2 but not >>1?) >> >> >> >> Then, when I try this: >> >> collector.send(new OutgoingMessageEnvelope(OUTPUT_STREAM,0, 0, >> >> outgoingMap)); collector.send(new >> >>OutgoingMessageEnvelope(OUTPUT_STREAM,1, >> >> 1, outgoingMap)); collector.send(new >> >> OutgoingMessageEnvelope(OUTPUT_STREAM,2, 2, outgoingMap)); I don't >> >>receive >> >> anything on the OUTPUT_STREAM. >> >> >> >> Does it has anything to do with the SinglePartitionSystemAdmin in >> >> WikipediaSystemFactory? >> >> >> >> The second question is whether it's possible to send a message to all >> >>the >> >> partitions of a stream? (maybe, by sending a message multiple times, >> >>each >> >> specifying a partition?) >> >> >> >> Thanks in advance, >> >> Casey >> >> >> >> ----- >> >> No virus found in this message. >> >> Checked by AVG - www.avg.com >> >> Version: 2014.0.4259 / Virus Database: 3705/7127 - Release Date: >> >>02/26/14 >> >> >> >>
