That is helpful, thank you. And thanks for all the good work you guys have been doing on Kafka, I think its a great piece of software.
David Harris On Mon, Oct 22, 2012 at 1:19 PM, Neha Narkhede <neha.narkh...@gmail.com>wrote: > You are hitting https://issues.apache.org/jira/browse/KAFKA-278 > > The partitioning semantics that Kafka 0.7.x provides is sort of weak > for it to be used for sticky partitioning features like you need here. > The reason is that partitions hosted on a broker can go offline when a > broker goes offline. > This means that during this downtime, the data meant for the subsets > hosted on those partitions will be sent to other subsets on the fly, > or they will be dropped (depending on how you configure the > partitioner). > > In Kafka 0.8, partitions are always available in spite of individual > broker failures. These strong durability guarantees enables sticky > partitioning to work as expected. > > Hope that helps, > Neha > > On Mon, Oct 22, 2012 at 10:22 AM, David Harris <dhar...@avum.com> wrote: > > Hi Everyone, > > > > I want to have particular subsets of my data sent to different > partitions so > > that I can have consumers across different machines (or multiple > instances > > of the consumers running in different threads) handle the subsets of > data. > > The definition of these subsets is important, meaning data of type 1 > needs > > to go into subset 1 etc. > > > > My set up is that I have kafka (0.7.1) and zookeeper running on a single > > machine like described in the quick start guide. In my server.properties > > file I’ve set num.partitions=4. > > > > I’m working on testing this all out with a simple class that passes one > > character strings [a-z] as messages, and I have a > > kafka.producer.Partitioner that puts a-g, h-m, n-s and t-z into separate > > partitions. The issue I’m having is that when running my code for the > first > > time (i.e. the topic doesn't exist in kafka yet) I’m seeing that in the > > “public int partition(String s, int numPartitions)” method of my > Partitioner > > the numPartitions is 1 the first few times it is called, then after a > while > > its coming as 4. In my example this is causing some w, y, z etc to be > > included in the partition with a, b and c’s. If I’ve already run the > code > > once and I see the four folders under the /tmp/kafka-logs for my > partition > > everything works as expected. > > > > I’ve attached my test code that shows this issue. (I believe that > > attachments come across, if not I can paste in the body of the email). > I’m > > not sure if I’m doing something wrong in the code or if I’m approaching > this > > problem in the wrong way. It seems that an alternative approach would > be to > > have a separate topic for each subset of data, and then have my producer > > push to the different topics. Any advice/suggestions? > > > > On a related note, when reading up about this topic in the quick start > guide > > I see that it describes creating a ProducerData object like the > following: > > ProducerData<String, String> data = new ProducerData<String, > > String>("test-topic", "test-key", "test-message"); > > But looking at the API docs I see that the only constructor that allows > me > > to specify a key takes a List[v] as the third parameter: > > ProducerData(topic: String, key: K, data: List[V]) > > Am I missing something here? > > > > Thanks > > David Harris > > >