[
https://issues.apache.org/jira/browse/IGNITE-529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15000123#comment-15000123
]
Anton Vinogradov edited comment on IGNITE-529 at 11/11/15 9:02 AM:
-------------------------------------------------------------------
Roman,
R> Ok. So a cache is created based on what is in Ignite configurations xml,
right?
A> Cache should be already created externally at ignite cluster before Sink
started. I'm not sure but seems you can describe no cache at configuration but
get it using ignite.cache(cacheName)
R> Just out of curiosity, what is the reason for not allowing
getOrCreateCache()? Is it for having all configurations in one place (xml
file)? With getCache() the user will have to specify the cache name both in
Ignite configurations xml and sink configurations file.
A> Cache name should be provided via param (simmilar to current solution). The
only one reason for using ignite.cache(cacheName) (not allowing
getOrCreateCache()) is that we need guarantee that no redundant cache will be
created because of wrong Sink configuration.
R> Do you mean the transformer is something more than just converting Flume
event to key&value types of the cache? Something that exposes a cache instance
to the user and the user can choose whether to use put() or putAll() in his/her
implementation?
R> I wouldn't expose the cache to the user – just let him/her implement data
conversion interface to specify at configuration (2) in your proposal) and use
putAll() since this will probably be the most used method considering batching
is good for large data loads (even if creating maps may introduce memory
overheads).
R > What do you think?
A> I think initial implementation can use code simillat to following:
try {
transaction.begin();
event = channel.take();
if (event != null)
cache.putAll(transformer.transform(event))
else
status = Status.BACKOFF;
transaction.commit();
}
But, usage of IgniteDataStreamer can brings more speed in case you'll gain a
lot of keys while processing event.
Seems that custom transformer can produce huge amount of data. Is it possible
case?
In this case generation of Map and using putAll will brings memory overhead and
delays. Transformer which do datastreamder.put() while processing event and
flush() at it's finish will handle event faster. But i'm still not sure it's
possible case.
R> What instance to you have in mind? Normally it is sufficient to have a
channel and a sink, as it is in my tests. Our you can run it as I described in
README (but that is not for tests).
A> I see you forcing
channel.put(event); like Flume will.
That seems to be correct test case, But I wonder is it possible to brings mode
Flume to test, use embedded Flume or something like this ;). Thats just a
question not proposal.
was (Author: avinogradov):
Roman,
R> Ok. So a cache is created based on what is in Ignite configurations xml,
right?
A> Cache name should be provided via param (simmilar to current solution). The
only one reason for using ignite.cache(cacheName) (not allowing
getOrCreateCache()) is that we need guarantee that no redundant cache will be
created because of wrong Sink configuration.
R> Do you mean the transformer is something more than just converting Flume
event to key&value types of the cache? Something that exposes a cache instance
to the user and the user can choose whether to use put() or putAll() in his/her
implementation?
R> I wouldn't expose the cache to the user – just let him/her implement data
conversion interface to specify at configuration (2) in your proposal) and use
putAll() since this will probably be the most used method considering batching
is good for large data loads (even if creating maps may introduce memory
overheads).
R > What do you think?
A> I think initial implementation can use code simillat to following:
try {
transaction.begin();
event = channel.take();
if (event != null)
cache.putAll(transformer.transform(event))
else
status = Status.BACKOFF;
transaction.commit();
}
But, usage of IgniteDataStreamer can brings more speed in case you'll gain a
lot of keys while processing event.
Seems that custom transformer can produce huge amount of data. Is it possible
case?
In this case generation of Map and using putAll will brings memory overhead and
delays. Transformer which do datastreamder.put() while processing event and
flush() at it's finish will handle event faster. But i'm still not sure it's
possible case.
R> What instance to you have in mind? Normally it is sufficient to have a
channel and a sink, as it is in my tests. Our you can run it as I described in
README (but that is not for tests).
A> I see you forcing
channel.put(event); like Flume will.
That seems to be correct test case, But I wonder is it possible to brings mode
Flume to test, use embedded Flume or something like this ;). Thats just a
question not proposal.
> Implement IgniteFlumeStreamer to stream data from Apache Flume
> --------------------------------------------------------------
>
> Key: IGNITE-529
> URL: https://issues.apache.org/jira/browse/IGNITE-529
> Project: Ignite
> Issue Type: Sub-task
> Components: streaming
> Reporter: Dmitriy Setrakyan
> Assignee: Roman Shtykh
>
> We have {{IgniteDataStreamer}} which is used to load data into Ignite under
> high load. It was previously named {{IgniteDataLoader}}, see ticket
> IGNITE-394.
> See [Apache Flume|http://flume.apache.org/] for more information.
> We should create {{IgniteFlumeStreamer}} which will consume messages from
> Apache Flume and stream them into Ignite caches.
> More details to follow, but to the least we should be able to:
> * Convert Flume data to Ignite data using an optional pluggable converter.
> * Specify the cache name for the Ignite cache to load data into.
> * Specify other flags available on {{IgniteDataStreamer}} class.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)