[jira] [Commented] (IGNITE-2016) Update KafkaStreamer to fit new features introduced in Kafka 0.9

Denis Magda (JIRA) Wed, 23 Dec 2015 02:27:13 -0800

    [ 
https://issues.apache.org/jira/browse/IGNITE-2016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15069479#comment-15069479
 ]


Denis Magda commented on IGNITE-2016:
-------------------------------------

Roman, 

Ok, finally I've figured out what is the difference between streamers, that we 
already have, and connectors that were supported by Kafka in 0.9 release.
Agree, that a Kafka sink is absolutely different concept and it mustn't be 
mixed with streamers.

So far I have the following high-level (design related) review comments. Please 
address them first and after that I'll start reviewing the code in detail.

1) In any case let's put Kafka sink implementation in existed {{ignite-kafka}} 
module. There is no need to introduce additional module cause all Kafka related 
stuff will be located in one single place.
Module structure should look like this:
- {{org.apache.ignite.stream.kafka}} package will contain {{KafkaStreamer}}. 
Later we can add {{KafkaStreamerV2}} to this package that will be implemented 
using the new consumer API;
- {{org.apache.ignite.stream.kafka.connect}} package will contain your current 
Kafka Connect based implementation.

2) Update {{kafka.version}} referred from {{ignite-kafka/pom.xml}} to the 
latest 0.9 version and check that the all streamer works perfectly well (it 
should according to Kafka docs).

3) {{IgniteSinkTask.flush()}} method delivers data to the grid using 
{{cache.putAll(...)}}. Instead of this approach I would switch to 
{{IgniteDataStreamer}} and use it data streaming to Ignite. The reason is that 
{{IgniteDataStreamer}} will upload data to the grid much faster than 
{{cache.putAll(...)}}.

4) {{IgniteSinkTask.put(...)}} buffers data in some internal data structure. Is 
there any Kafka API requirement saying that the data mustn't been flushed until 
{{flush}} method is called explicitly? Generally speaking I would reuse 
{{IgniteDataStreamer}} here as well by setting 
{{IgniteDataStreamer.autoFlushFrequency(...)}} that will be equal to sink flush 
frequence and just forward all the data to the streamer as soon as it's 
delivered via  {{IgniteSinkTask.put(...)}}. The streamer will buffer the data 
and flush it to the grid with specified frequency or when the internal buffer 
reaches some limit.
 

> Update KafkaStreamer to fit new features introduced in Kafka 0.9
> ----------------------------------------------------------------
>
>                 Key: IGNITE-2016
>                 URL: https://issues.apache.org/jira/browse/IGNITE-2016
>             Project: Ignite
>          Issue Type: New Feature
>          Components: streaming
>            Reporter: Roman Shtykh
>            Assignee: Roman Shtykh
>
> Particularly,
> - new consumer
> - Kafka Connect (Copycat)
> http://www.confluent.io/blog/apache-kafka-0.9-is-released
> This can be a a different integration task or a complete re-write of the 
> current implementation, considering the fact that Kafka Connect is a new 
> standard way for "large-scale, real-time data import and export for Kafka."



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (IGNITE-2016) Update KafkaStreamer to fit new features introduced in Kafka 0.9

Reply via email to