[
https://issues.apache.org/jira/browse/IGNITE-2016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15069479#comment-15069479
]
Denis Magda commented on IGNITE-2016:
-------------------------------------
Roman,
Ok, finally I've figured out what is the difference between streamers, that we
already have, and connectors that were supported by Kafka in 0.9 release.
Agree, that a Kafka sink is absolutely different concept and it mustn't be
mixed with streamers.
So far I have the following high-level (design related) review comments. Please
address them first and after that I'll start reviewing the code in detail.
1) In any case let's put Kafka sink implementation in existed {{ignite-kafka}}
module. There is no need to introduce additional module cause all Kafka related
stuff will be located in one single place.
Module structure should look like this:
- {{org.apache.ignite.stream.kafka}} package will contain {{KafkaStreamer}}.
Later we can add {{KafkaStreamerV2}} to this package that will be implemented
using the new consumer API;
- {{org.apache.ignite.stream.kafka.connect}} package will contain your current
Kafka Connect based implementation.
2) Update {{kafka.version}} referred from {{ignite-kafka/pom.xml}} to the
latest 0.9 version and check that the all streamer works perfectly well (it
should according to Kafka docs).
3) {{IgniteSinkTask.flush()}} method delivers data to the grid using
{{cache.putAll(...)}}. Instead of this approach I would switch to
{{IgniteDataStreamer}} and use it data streaming to Ignite. The reason is that
{{IgniteDataStreamer}} will upload data to the grid much faster than
{{cache.putAll(...)}}.
4) {{IgniteSinkTask.put(...)}} buffers data in some internal data structure. Is
there any Kafka API requirement saying that the data mustn't been flushed until
{{flush}} method is called explicitly? Generally speaking I would reuse
{{IgniteDataStreamer}} here as well by setting
{{IgniteDataStreamer.autoFlushFrequency(...)}} that will be equal to sink flush
frequence and just forward all the data to the streamer as soon as it's
delivered via {{IgniteSinkTask.put(...)}}. The streamer will buffer the data
and flush it to the grid with specified frequency or when the internal buffer
reaches some limit.
> Update KafkaStreamer to fit new features introduced in Kafka 0.9
> ----------------------------------------------------------------
>
> Key: IGNITE-2016
> URL: https://issues.apache.org/jira/browse/IGNITE-2016
> Project: Ignite
> Issue Type: New Feature
> Components: streaming
> Reporter: Roman Shtykh
> Assignee: Roman Shtykh
>
> Particularly,
> - new consumer
> - Kafka Connect (Copycat)
> http://www.confluent.io/blog/apache-kafka-0.9-is-released
> This can be a a different integration task or a complete re-write of the
> current implementation, considering the fact that Kafka Connect is a new
> standard way for "large-scale, real-time data import and export for Kafka."
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)