[
https://issues.apache.org/jira/browse/IGNITE-529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14998660#comment-14998660
]
Roman Shtykh commented on IGNITE-529:
-------------------------------------
Anton, please see my comments inline.
Could you please check this issue.
R> I might forget to commit something. I appologize for that and will check it
tomorrow morning.
Sink will be created inside Flume and use FlumeStreamer to put data to Ignite.
R> Right.
I this case Extractor should be specified inside FlumeStreamer (not as a
constructor parameter, streamer should decide what transformer it should use or
custom implementation can have custom transformer). I mean Flume & Ignite logic
shoul be separated. All Flume related logic inside Sink & and all Ignite
related inside FlumeStreamer.
DataStreamer also should not be provided by Sink, because it Ignite-related
logic.
R> I.e., the user should be responsible for implementing a Flume streamer,
right? That is why my first approach was to enable extending a Flume streamer
base class and let a user to implement all event conversion and other logic.
Then it is specified in Flume's configurations and instantiated in sink.
R> That is the best separation we can achieve. Ignite instance and cache will
be created in Flume streamer though, and that is what you mentioned not
recommended.
R> In my current approach I take care of everything except implementing an
extractor which is a user's responsibility (close to what most Flume sinks have
now, please see serializer and converter parameters at
https://flume.apache.org/FlumeUserGuide.html).
R> Both approaches are not perfect, but we have to choose one. Do you have
another idea?
In case it impossible to provide configured instance of FlumeStreamer to Sink,
FlumeStreamer should build themself using some configuration, possible provided
via constructor.
R> Sorry, I am not sure I understand how Sink can send events to Ignite if
FlumeStreamer is not specified...
Sink in my view is just a channel and it should communicate with FlumeStreamer
only without usage of other apache.ignite.* classes.
R> I've seen many implementations of Flume sinks. As I mentioned, sink takes
responsibilities for instantiation of components that save incoming data. On
the other hand, Ignite streamer (I judge from implementation we have now) also
wants to instantiate components (clients) for pulling data.
R> To achieve complete separation of sink and streamer, I think only of having
sink and streamer on separate JVMs communicating over some protocol (which is
an overhead already).
> Implement IgniteFlumeStreamer to stream data from Apache Flume
> --------------------------------------------------------------
>
> Key: IGNITE-529
> URL: https://issues.apache.org/jira/browse/IGNITE-529
> Project: Ignite
> Issue Type: Sub-task
> Components: streaming
> Reporter: Dmitriy Setrakyan
> Assignee: Roman Shtykh
>
> We have {{IgniteDataStreamer}} which is used to load data into Ignite under
> high load. It was previously named {{IgniteDataLoader}}, see ticket
> IGNITE-394.
> See [Apache Flume|http://flume.apache.org/] for more information.
> We should create {{IgniteFlumeStreamer}} which will consume messages from
> Apache Flume and stream them into Ignite caches.
> More details to follow, but to the least we should be able to:
> * Convert Flume data to Ignite data using an optional pluggable converter.
> * Specify the cache name for the Ignite cache to load data into.
> * Specify other flags available on {{IgniteDataStreamer}} class.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)