Hi Yi,

        Thanks for the clarification, it was helpful.


I would also like to know your views on the below issues and if you have
employed something to overcome those.

LogCompaction Issues:

https://issues.apache.org/jira/browse/KAFKA-2163 - Offsets manager cache
should prevent stale-offset-cleanup while an offset load is in progress;
otherwise we can lose consumer offsets – *Might be an issue as it will
result in no offset to be read thereby failing the bootstrap of local key
value store*

https://issues.apache.org/jira/browse/KAFKA-2118 - Cleaner cannot clean
after shutdown during replaceSegments –
*Will prevent reading log compacted topic causing failure of local key
value store bootstrap*

https://issues.apache.org/jira/browse/KAFKA-2235 - LogCleaner offset map
overflow –
*Will probably be an issue for some clients who has smaller  message size
and large number of keys. They need to fine tune a lot to make sure that
this doesn't happen.*



Replication Issues:

https://issues.apache.org/jira/browse/KAFKA-2477 - Replicas spuriously
deleting all segments in partition –
*Will cause the data in changelog topic to be lost resulting in failure of
local key value store bootstrap. *


Though Samza can be plugged with different messaging systems, Kafka is the
major system that is supported today for state-full processing. If that's
the case the following bugs will potentially make Samza also to not work
properly (Ex: if there is replication issue called out below in a log
compacted topic happens, then Samza might not be able to restore its local
key value store).. Since you are running Samza with state-full processing,
the above issues might result your Samza job with key value store in an
in-consistent state. Are you using Samza with stateful processing for
critical applications which cannot tolerate loss of data or
inconsistencies? (Because with the above bugs you might not be able to run
the job for critical application as it might fail if it is hit with the
above issues). I believe that upgrading to 0.9 Kafka is much critical to
ensure that Samza also works properly (I do understand that its not a issue
with Samza, but I believe that the one of the primary reason for
customers/devs choosing Samza is its fine ability to do state-full
processing and if that is not working or will fail due to dependency on
Kafka, it becomes necessary to upgrade to Kafka asap), please correct me if
I am wrong here.


Thanks,

Nick




------------------------------



Hi, Nick,



Let me try to answer in-between the lines:



On Thu, Mar 31, 2016 at 12:49 AM, nick xander <nickxander...@gmail.com>

wrote:



>

> * Do you guys experience issue with Kafka when it is used with log

> compaction for Samza's state full management?

>



The critical issue on log-compaction in Kafka that we care about is the

case where message compression and log-compaction are *both* used in the

same topic. Currently, for changelog topics, we forcefully turned off

compression. Hence, it is not a problem for Samza's KV-stores. It is still

a problem for checkpoint topics if the Kafka producer is configured to use

message compression.





> * What is the avg number of keys per partition that you have observed in

> Kafka's log compacted topic for state full management, total number of

> partition, replication factor and number of Kafka brokers?

>



This number varies *a lot*, depending on how big your KV-store is. For

example, we have seem around 5-10GB of RocksDB KV-stores being stored in

changelog in LinkedIn. That will cause a long bootstrap time when the

container is restarted on a different host. Hence, we included

host-affinity feature in Samza 0.10, which cut down the bootstrap time for

that particular job by 20x.





> * Will Kafka 0.9 upgrade will be included as part of Samza 0.10.1 as it

> seems critical if Samza is used for stateful management? And what is the

> timeline for Samza 0.10.1 that you are expecting?

>



We are planning to release Samza 0.10.1 very soon and are working on

pending code reviews and validations now. Depending on the test/validation

cycles, we hope to get Samza 0.10.1 release candidate ready in a month or

so. Kafka 0.9 upgrade will likely not be in Samza 0.10.1, due to the tight

release timeline this time.





> * What is recommendation between the usage of Samza vs Kafka connect?

> Should we use Samza for state full management and Kafka connect for other

> stateless streaming soslution?

>

>

KafkaConnect is mainly an ingest/output connector to/from Kafka, not having

much stateful processing. Samza actually does both ingest/output and

stateful process. If there are input data sources that Samza does not have

a SystemConsumer implementation for yet, you can definitely use

KafkaConnect for ingestion and Samza for stateful processing.



Hope the above answered your questions.



Thanks!



-Yi



On Thu, Mar 31, 2016 at 9:49 AM, nick xander <nickxander...@gmail.com>
wrote:

> Hi All,
>     As per this article:
> http://www.confluent.io/blog/290-reasons-to-upgrade-to-apache-kafka-0.9.0.0
> there are some well know bugs and feature improvements around log
> compaction (state full management in Samza) and Replication. I also saw in
> Samza issues about this upgrade:
> https://issues.apache.org/jira/browse/SAMZA-855. My questions here:
>
> * Do you guys experience issue with Kafka when it is used with log
> compaction for Samza's state full management?
> * What is the avg number of keys per partition that you have observed in
> Kafka's log compacted topic for state full management, total number of
> partition, replication factor and number of Kafka brokers?
> * Will Kafka 0.9 upgrade will be included as part of Samza 0.10.1 as it
> seems critical if Samza is used for stateful management? And what is the
> timeline for Samza 0.10.1 that you are expecting?
> * What is recommendation between the usage of Samza vs Kafka connect?
> Should we use Samza for state full management and Kafka connect for other
> stateless streaming soslution?
>
> Thanks,
> Nick
>

Reply via email to