Hey Chris,

Thanks for the quick response!

So the thing is, we have the kafka producer independent of Samza. The idea is 
to test the kafka streams with different CEPs. So one example would be Storm. 
That's why I have a separate kafka job running that reads from a file and 
writes to a kafka topic.

So assuming there is a topic say "input-topic" (where the message is some event 
of type "String" and key is the actual eventId of the event) already in place, 
I want to write a SamzaStreamTask that will read this string, parse it and 
write to another kafka topic. In other words, Job1 is already done independent 
of Samza. I'm working on Job2 using Samza.

1. Specifically, since I'm using kafka I don't have to write Consumer and 
Systemfactory Classes? Correct?
2. For the SamzaStreamTask would the input be a String? i.e.
    String event = (String)envelope.getMessage();

Thanks!
Sonali

-----Original Message-----
From: Chris Riccomini [mailto:[email protected]]
Sent: Tuesday, March 18, 2014 3:16 PM
To: [email protected]
Subject: Re: Writing my Custom Job

Hey Sonali,

1. For CSV file reading, you should check this JIRA out:

  https://issues.apache.org/jira/browse/SAMZA-138

2. You don't need to write to a Kafka topic using the standard Kafka producer. 
You can use the collector that comes as part of the process method. Take a look 
at one of the hello-samza examples to see how this is done. 
(collector.send(...))

3. To parse the string, retrieve specific fields, etc, you should write a 
second StreamTask that reads from the first. The flow should look like:

<file> -> Job 1 -> Kafka topic 1 -> Job 2 -> Kafka topic 2

Where "Job 1" sends messages to "Kafka topic 1" partitioned by event ID, and 
"Job 2" parses and retrieves specific fields, and produces to "Kafka topic 2".

Cheers,
Chris

On 3/18/14 2:48 PM, "[email protected]"
<[email protected]> wrote:

>Hey Guys,
>
>So I'm writing my custom job in Samza and wanted to make sure I'm not
>re-inventing the wheel.
>
>I have a kafka job running that reads from a csv file and writes to a
>topic. I wrote this using the kafka producer api independent of Samza.
>The output is a KeyedMessage with key being my eventId and the value is
>a string corresponding to my event.
>
>Now, I want to write a SamzaConsumer that listens on my topic, parses
>the string to retrieve specific fields I'm interested in and write it
>out to a different kafka topic.
>
>Are there existing classes I can leverage to do this?
>
>Thanks,
>Sonali
>
>Sonali Parthasarathy
>R&D Developer, Data Insights
>Accenture Technology Labs
>703-341-7432
>
>
>________________________________
>
>This message is for the designated recipient only and may contain
>privileged, proprietary, or otherwise confidential information. If you
>have received it in error, please notify the sender immediately and
>delete the original. Any other use of the e-mail by you is prohibited.
>Where allowed by local law, electronic communications with Accenture
>and its affiliates, including e-mail and instant messaging (including
>content), may be scanned by our systems for the purposes of information
>security and assessment of internal compliance with Accenture policy.
>_______________________________________________________________________
>___
>____________
>
>www.accenture.com



________________________________

This message is for the designated recipient only and may contain privileged, 
proprietary, or otherwise confidential information. If you have received it in 
error, please notify the sender immediately and delete the original. Any other 
use of the e-mail by you is prohibited. Where allowed by local law, electronic 
communications with Accenture and its affiliates, including e-mail and instant 
messaging (including content), may be scanned by our systems for the purposes 
of information security and assessment of internal compliance with Accenture 
policy.
______________________________________________________________________________________

www.accenture.com

Reply via email to