Hey Garry,

It sounds like your understanding of bootstrap streams is correct.

Bootstrap stream messages will be delivered to the process() method just
like any other. The only difference is you're supposed to get all of them
from 0-lastOffset before you get any messages from non-bootstrap streams.
Your positive/negative example sounds like a reasonable use case for a
bootstrap stream.

A few questions:

1. Can you post the container logs and the full configuration file for
your job somewhere (e.g. Github gist)?
2. Are you putting data into the positive-words and negative-words topic
before you start the Samza job?

Also, you can do envelope.getSystemStreamPartition().getStream() directly
(no need to call getSystemStream()).

Cheers,
Chris

On 2/10/14 3:18 AM, "Garry Turkington" <g.turking...@improvedigital.com>
wrote:

>Hi,
>
>I was building a task to do some sentiment analysis on incoming data. I
>have a corpus each of positive and negative words to which the task needs
>access. This seemed like a good fit for bootstrap streams. But I can't
>seem to get them to work.
>
>I have my job configured with the 3 Kafka topics in task.inputs and that
>seems to work, just throwing data at any of the topics is hitting the
>task.
>
>But setting up the 2 reference streams as bootstrap doesn't seem to be
>working. Here's the relevant part of the config, I want to read the
>entire message history each time:
>
>systems.kafka.streams.positive-words.samza.bootstrap=true
>systems.kafka.streams.positive-words.samza.reset.offset=true
>
>systems.kafka.streams.negative-words.samza.bootstrap=true
>systems.kafka.streams.negative-words.samza.reset.offset=true
>
>Do bootstrap streams get handled in any special way, I'm assuming here
>that the messages will arrive in the process method on StreamTask just
>like any other and I can handle them differently by switching on
>envelope.getSystemStreamPartition().getSystemStream().getStream().
>Looking at the code it looks the same with the BootstrapChooser doing its
>magic to determine which message is delivered to the task but the actual
>delivery seems the same.
>
>What am I missing?
>
>Thanks,
>Garry
>

Reply via email to