Problem: undesired data in a temporary topic

Louis Verret Wed, 06 Sep 2017 08:27:07 -0700

Hello, 

I am writing to you because we are facing a technical issue regarding the 
development of our data streaming application using Kafka.


Our application consists of two Java main: The Producer and The Consumer . 
The Producer periodically reads data lines from a file and sends them (as a 
message) into a Kafka topic. 
The Consumer uses a Kafka stream processor (i.e a Java application that 
implements the Kafka Streams interface Processor ). Hence, this Java 
application overrides the two Java methods process() and punctate() of the 
Processor Interface. The process() method stores in a KeyValueStore instance 
the lines of a Kafka message in a topic. The offset is updated in order to not 
double read a message. The punctate() method reads the KeyValueStore instance 
and processes the data. 

The problem is this: between two runnings of the Producer and Consumer main (in 
parallel), data lines are stored as residue in a temporary topic called 
test-name_of_the_store-changelog. When executing the Consumer and the Launcher 
the second time, we believe that the data residue are re-load from the topic t 
est-name_of_the_store-changelog into the current topic. Then, the first data 
read by the Launcher at the second execution is undesired data (last data of 
the first execution). When we delete the t est-name_of_the_store-changelog 
topic from command line, the problem is fixed. Since command line executions 
are different between OS distributions, we want to delete this topic from the 
application Java code just at the beginning of the running. Is there a way to 
realize that using the Kafka Streams API ? 

Best regards, 

Louis

Problem: undesired data in a temporary topic

Reply via email to