Hi, Chen,


On Wed, Oct 28, 2015 at 4:05 AM, Yan Fang <yanfangw...@163.com> wrote:

>
>
> * Is there a tentative date for 0.10.0 release?
>     I think it's coming out soon. @Yi Pan , he should know more about that.
>

There is a bit delay on the release date due to a recent bug we discovered
in test. The targeted date would be in Nov.

>
>
> * I checked the checkpoint topic for Samza job and it seems the checkpoint
> topic is created with1 partition by default. Given that each Samza task
> will need to read from checkpoint topic, it is similar to what I need to
> read (Each Samza task is reading from the same partition of a topic). I am
> wondering how is that achieved?
>     In current implementation, only the AM reads the checkpoint stream and
> distribute the information to all the nodes using the http server. Not all
> the nodes are consuming the checkpoint stream. Correct me if I am wrong.
>

The checkpoint topic is a special one that the containers only read during
the start up phase. Hence, it is not considered as part of the
SystemStreamPartitions that are assigned to the tasks. As Yan mentioned,
broadcast stream in 0.10 is the solution to your use case.

Thanks!


>
>
> Thanks,
> Yan
>
>
>
>
>
>
> At 2015-10-28 02:49:23, "Chen Song" <chen.song...@gmail.com> wrote:
> >Thanks Yan.
> >
> >* Is there a tentative date for 0.10.0 release?
> >* I checked the checkpoint topic for Samza job and it seems the checkpoint
> >topic is created with1 partition by default. Given that each Samza task
> >will need to read from checkpoint topic, it is similar to what I need to
> >read (Each Samza task is reading from the same partition of a topic). I am
> >wondering how is that achieved?
> >
> >Chen
> >
> >On Sat, Oct 24, 2015 at 5:52 AM, Yan Fang <yanfangw...@163.com> wrote:
> >
> >> Hi Chen Song,
> >>
> >>
> >> Sorry for the late reply. What you describe is a typical bootstrap use
> >> case. Check
> >> http://samza.apache.org/learn/documentation/0.9/container/streams.html
> ,
> >> the bootstrap configuration. By using this one, Samza will always read
> the
> >> *topicR* from the beginning when it restarts. And then it treats the
> >> *topicR* as a normal topic after reading existing msgs in the *topicD*.
> >>
> >>
> >> == can we configure each individual Samza task to read data from all
> >> partitions from a topic?
> >> It works in the 0.10.0 by using the broadcast stream. In the 0.9.0, you
> >> have to "create topicR with the same number of partitions as *topicD*,
> and
> >> replicate data to all partitions".
> >>
> >>
> >> Hope this still helps.
> >>
> >>
> >> Thanks,
> >> Yan
> >>
> >>
> >> At 2015-10-22 04:44:41, "Chen Song" <chen.song...@gmail.com> wrote:
> >> >In our samza app, we need to read data from MySQL (reference table)
> with a
> >> >stream. So the requirements are
> >> >
> >> >* Read data into each Samza task before processing any message.
> >> >* The Samza task should be able to listen to updates happening in
> MySQL.
> >> >
> >> >I did some research after scanning through some relevant conversations
> and
> >> >JIRAs on the community but did not find a solution yet. Neither I find
> a
> >> >recommended way to do this.
> >> >
> >> >If my data streams comes from a topic called *topicD*, options in my
> mind
> >> >are:
> >> >
> >> >   - Use Kafka
> >> >      1. Use one of CDC based solution to replicate data in MySQL to a
> >> >      topic Kafka.
> https://github.com/wushujames/mysql-cdc-projects/wiki.
> >> >      Say the topic is called *topicR*.
> >> >      2. In my Samza app, read reference table from *topicR *and
> persisted
> >> >      in a cache in each Samza task's local storage.
> >> >         - If the data in *topicR *is NOT partitioned in the same way
> as
> >> >         *topicD*, can we configure each individual Samza task to read
> >> data
> >> >         from all partitions from a topic?
> >> >         - If the answer to the above question is no, do I need to
> >> >create *topicR
> >> >         *with the same number of partitions as *topicD*, and replicate
> >> >         data to all partitions?
> >> >         - On start, how to make Samza task to block processing the
> first
> >> >         message from *topicD* before reading all data from *topicR*.
> >> >      3. Any new updates/deletes to *topicR *will be consumed to update
> >> the
> >> >      local cache of each Samza task.
> >> >      4. On failure or restarts, each Samza task will read from the
> >> >      beginning from *topicR*.
> >> >   - Not Use Kafka
> >> >      - Each Samza task reads a Snapshot of database and builds its
> local
> >> >      cache, and it then needs to read periodically to update its
> >> >local cache. I
> >> >      have read about a few blogs, and this doesn't sound a solid way
> >> >in the long
> >> >      term.
> >> >
> >> >Any thoughts?
> >> >
> >> >Chen
> >> >
> >> >   -
> >> >
> >> >--
> >> >Chen Song
> >>
> >
> >
> >
> >--
> >Chen Song
>

Reply via email to