KStreams was surely incubated based on early Samza learnings.   Although
they have a common base in terms of architecture, they started evolving
independently a year ago.

In the last year there have been significant improvements in Samza in order
to make stateful stream processing more reliable and production ready.  The
significant ones can be found in this article :
https://engineering.linkedin.com/blog/2016/01/whats-new-samza
<https://engineering.linkedin.com/blog/2016/01/whats-new-samza>

To summarize some of these improvements: Samza stores the partition mapping
durably.  In addition, Samza integrates deeply with YARN to ensure that
YARN doesn't move containers around if the jobs are stateful. This allows
for state to be sticky and hence avoids reseeding state even when the
client application is upgraded, stopped, restarted etc.  Samza also makes
sure that minimal amount of state gets moved when you increase/decrease the
number of containers in your job. These improvements have been critical in
keeping many production jobs (especially with large state) stable.

One of the other key differences between Samza and KStreams is that
although Samza has first class support for Kafka, it  fundamentally
supports input and output from/to non-kafka sources.   for e.g. at LinkedIn
we have a Samza job which reads from Kinesis and DynamoDB streams
directly.  This can in many situations significantly reduce additional
Kafka hardware cost and operational cost of running a separate bridging
service which moves the data from the external source into Kafka.    We are
also prototyping running Samza jobs in our hadoop grids and have them
reading directly from HDFS and produce to HDFS.  If this is successful we
will allow for running batch jobs in Samza (this will support some
scenarios where customers want to do experimentation in hadoop before
moving to processing in near real time using kafka).

Having said the above, the additional flexibility and support for sticky
stateful apps in Samza comes at the cost of additional things to configure.
More work will happen in Samza  in the future to make the config simpler.

On the question of futures, Samza will continue to evolve independently and
we have a long list of stream processing features and ease of use
improvements that we hope to contribute to Samza in the coming year.

Hope that helps.

Thanks
Kartik








On Mon, May 23, 2016 at 10:28 AM, Sriram Ramachandrasekaran <
[email protected]> wrote:

> Hello Yi, et all,
>
> I've been following Samza and Kafka (and, Kafka Streams). Given the state
> where Kafka Streams is, it provides a nice high level API for consuming
> stuff from Kafka + support for localized state. If thrown into an
> environment like mesos(fronted by marathon), we should get distribution out
> of the box.
>
> I wanted to hear from you, what this means to Samza's roadmap and what you
> guys are thinking about it. As I understand, a lot of Samza's learning have
> gone into Kafka Streams, which means, it should be more polished out of the
> box. Please share your thoughts.
>
>
>
> --
> It's just about how deep your longing is!
>



-- 
We are hiring in Streams Infra (Kafka/Samza/Datastream) !!

Reply via email to