[ 
https://issues.apache.org/jira/browse/IGNITE-529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14998660#comment-14998660
 ] 

Roman Shtykh commented on IGNITE-529:
-------------------------------------

Anton, please see my comments inline.

Could you please check this issue.
R> I might forget to commit something. I appologize for that and will check it 
tomorrow morning.

Sink will be created inside Flume and use FlumeStreamer to put data to Ignite.
R> Right.

I this case Extractor should be specified inside FlumeStreamer (not as a 
constructor parameter, streamer should decide what transformer it should use or 
custom implementation can have custom transformer). I mean Flume & Ignite logic 
shoul be separated. All Flume related logic inside Sink & and all Ignite 
related inside FlumeStreamer.
DataStreamer also should not be provided by Sink, because it Ignite-related 
logic.
R> I.e., the user should be responsible for implementing a Flume streamer, 
right? That is why my first approach was to enable extending a Flume streamer 
base class and let a user to implement all event conversion and other logic. 
Then it is specified in Flume's configurations and instantiated in sink.
R> That is the best separation we can achieve. Ignite instance and cache will 
be created in Flume streamer though, and that is what you mentioned not 
recommended.

R> In my current approach I take care of everything except implementing an 
extractor which is a user's responsibility (close to what most Flume sinks have 
now, please see serializer and converter parameters at 
https://flume.apache.org/FlumeUserGuide.html).
R> Both approaches are not perfect, but we have to choose one. Do you have 
another idea?

In case it impossible to provide configured instance of FlumeStreamer to Sink, 
FlumeStreamer should build themself using some configuration, possible provided 
via constructor.
R> Sorry, I am not sure I understand how Sink can send events to Ignite if 
FlumeStreamer is not specified...

Sink in my view is just a channel and it should communicate with FlumeStreamer 
only without usage of other apache.ignite.* classes.

R> I've seen many implementations of Flume sinks. As I mentioned, sink takes 
responsibilities for instantiation of components that save incoming data. On 
the other hand, Ignite streamer (I judge from implementation we have now) also 
wants to instantiate components (clients) for pulling data.
R> To achieve complete separation of sink and streamer, I think only of having 
sink and streamer on separate JVMs communicating over some protocol (which is 
an overhead already).


> Implement IgniteFlumeStreamer to stream data from Apache Flume
> --------------------------------------------------------------
>
>                 Key: IGNITE-529
>                 URL: https://issues.apache.org/jira/browse/IGNITE-529
>             Project: Ignite
>          Issue Type: Sub-task
>          Components: streaming
>            Reporter: Dmitriy Setrakyan
>            Assignee: Roman Shtykh
>
> We have {{IgniteDataStreamer}} which is used to load data into Ignite under 
> high load. It was previously named {{IgniteDataLoader}}, see ticket 
> IGNITE-394.
> See [Apache Flume|http://flume.apache.org/] for more information.
> We should create {{IgniteFlumeStreamer}} which will consume messages from 
> Apache Flume and stream them into Ignite caches. 
> More details to follow, but to the least we should be able to:
> * Convert Flume data to Ignite data using an optional pluggable converter.
> * Specify the cache name for the Ignite cache to load data into.
> * Specify other flags available on {{IgniteDataStreamer}} class.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to