[
https://issues.apache.org/jira/browse/FLUME-2500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14173235#comment-14173235
]
Hari Shreedharan commented on FLUME-2500:
-----------------------------------------
Agreed, that pretty much sums it up.
The reason you'd write to Flume is because Flume's API is much easier to use
and much less complex than Kafka's. Also, Kafka API changes between major
releases and this shields the user from that. (This is my opinion, others might
disagree - having written code against Kafka, I get this feeling).
Flume is also much more effective at aggregating from a large number of
producers - and just a handful of consumers, especially when sending data over
a cross-data center link.
So you'd have a [ large number of producers] -> [a few flume agents using kafka
channel] -> hdfs.
I expect both these to pop up.
> Add a channel that uses Kafka
> ------------------------------
>
> Key: FLUME-2500
> URL: https://issues.apache.org/jira/browse/FLUME-2500
> Project: Flume
> Issue Type: Bug
> Reporter: Hari Shreedharan
> Assignee: Hari Shreedharan
> Attachments: FLUME-2500.patch
>
>
> Here is the rationale:
> - Kafka does give a HA channel, which means a dead agent does not affect the
> data in the channel - thus reducing delay of delivery.
> - Kafka is used by many companies - it would be a good idea to use Flume to
> pull data from Kafka and write it to HDFS/HBase etc.
> This channel is not going to be useful for cases where Kafka is not already
> used, since it brings is operational overhead of maintaining two systems, but
> if there is Kafka in use - this is good way to integrate Kafka and Flume.
> Here is an a scratch implementation:
> https://github.com/harishreedharan/flume/blob/kafka-channel/flume-ng-channels/flume-kafka-channel/src/main/java/org/apache/flume/channel/kafka/KafkaChannel.java
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)