Hey Nirmal,

Your method of comparison makes sense.

For Samza, you just need to setup a kafka system in the configuration to
point to your brokers (you can take a look at the wiki-parser.properties
file, for an example).

As far as playing with state, the best place to look right now is the
hello-stageful-world.samsa file in Samza's code base. This is a job config
for a job that uses state management. This feature is generally most
useful when you're doing joins (ad view stream + ad click stream), sorting
(sort messages by time over a 5 minute window, and emit top 10), or
aggregation (count page views by member id).

For partitioning and parallelism, you just need to make sure that your
input streams have a partition count > 1 (either by setting the Kafka
broker default, or by manually creating topics with a partition count >
1), and then set yarn.container.count > 1 in your Samza job's config, as
well.

Cheers,
Chris

On 10/21/13 12:33 AM, "Nirmal Kumar" <[email protected]> wrote:

>Hi Chris,
>
>Thanks a lot for the information.
>
>I am comparing Storm + Kafka0.8  vs  Samza.
>
>As part of the initial use case I am using Kafka API to publish messages
>to the Kafka Broker say 40k, 50k...
>Then using the Storm spout I am consuming these messages.
>On the Samza side I am using a similar Kafka Consumer that consumes
>messages from Kafka.
>
>Is this use case fine for comparing Storm + Kafka0.8  vs  Samza ? or do
>you think any other things that I need to consider?
>
>Also going forward I am planning to find some use cases where I can
>compare Samza's  features like State Management, Partitioning and
>Parallelism, etc.
>Any pointers towards these specific use cases?
>
>Thanks,
>-Nirmal
>
>-----Original Message-----
>From: Chris Riccomini [mailto:[email protected]]
>Sent: Thursday, October 17, 2013 10:42 PM
>To: [email protected]
>Subject: Re: Writing a simple KafkaProducer in Samza
>
>Hey Nirmal,
>
>Glad to hear that the hello-samza project is working for you.
>
>If I understand you correctly, I believe that you're saying you want to
>send messages to a Kafka topic, right? As you've pointed out, you can
>send messages to Kafka through Samza. You can also send messages to Kafka
>directly using the Kafka producer API. Which one you use depends on what
>you're code is doing.
>
>With Samza, typically a task is sending messages to Kafka as a reaction
>to some other event (which triggers the process method). For example, in
>the Wiki example, we send a message to Kafka whenever an update happens
>on Wikipedia (via the IRC channel). In this example, we had to write a
>Wikipedia consumer for Samza, which implements the SystemConsumer API in
>Samza. This implementation reads messages from the Wikimedia IRC channel.
>You can see this implementation here:
>
>
>https://github.com/linkedin/hello-samza/blob/master/samza-wikipedia/src/ma
>i
>n/java/samza/examples/wikipedia/system/WikipediaConsumer.java
>
>If you use Samza, you MUST have at least one task.input defined, which
>feeds messages to your StreamTask. Out of the box, Samza comes with a
>KafkaSystemConsumer implementation. The hello-samza project comes with
>the WikipediaSystemConsumer implementation. If you want to react to
>messages from another system, or feed, you'd have to implement this
>interface (and hopefully contribute it back :).
>
>The alternative approach would be to just send messages directly to Kafka
>using the Kafka API. This approach is more appropriate in cases that
>don't fit well with Samza's processing model (e.g. you can't easily
>implement the SystemConsumer API, you need to guarantee deployment on a
>specific host all the time, etc). For example, if you wanted to read
>syslog messages on a specific host, and send them to Kafka, it probably
>makes more sense to just write a simple Java main() method that creates a
>Kafka producer, polls syslog periodically, and calls producer.send()
>whenever a new message appears in the syslog.
>
>If you can be more specific about what you're doing, I can probably
>provide better advice.
>
>Cheers,
>Chris
>
>On 10/17/13 6:39 AM, "Nirmal Kumar" <[email protected]> wrote:
>
>>Hi All,
>>
>>I was referring the hello-samza project as was able to run it
>>successfully.
>>I was able to run all the jobs and also wrote a consumer task to listen
>>to kafka.wikipedia-stats topic.
>>
>>I now want to write a Samza job that act as a KafkaProducer to
>>continuously publishes simple string messages to a topic.
>>Just like the WikipediaFeedStreamTask that reads Wikipedia events and
>>publishes them to a topic.
>>I am not sure of the any value of task.inputs in the config properties
>>file?
>>The way I think is like a java program publishing string messages to a
>>kafka topic.
>>How can I write such a Samza job?
>>
>>Any pointers would be of great help.
>>
>>Later on I want can read the same messages from a consumer like
>>WikipediaParserStreamTask does.
>>Referring the hello-samza project I was able to write a Consumer task
>>that reads messages from the topic(kafka.wikipedia-stats) by simply
>>task.class=samza.examples.wikipedia.task.TestConsumer
>>task.inputs=kafka.wikipedia-stats
>>
>>
>>Thanks,
>>-Nirmal
>>
>>________________________________
>>
>>
>>
>>
>>
>>
>>NOTE: This message may contain information that is confidential,
>>proprietary, privileged or otherwise protected by law. The message is
>>intended solely for the named addressee. If received in error, please
>>destroy and notify the sender. Any use of this email is prohibited when
>>received in error. Impetus does not represent, warrant and/or
>>guarantee, that the integrity of this communication has been maintained
>>nor that the communication is free of errors, virus, interception or
>>interference.
>
>
>________________________________
>
>
>
>
>
>
>NOTE: This message may contain information that is confidential,
>proprietary, privileged or otherwise protected by law. The message is
>intended solely for the named addressee. If received in error, please
>destroy and notify the sender. Any use of this email is prohibited when
>received in error. Impetus does not represent, warrant and/or guarantee,
>that the integrity of this communication has been maintained nor that the
>communication is free of errors, virus, interception or interference.

Reply via email to