[
https://issues.apache.org/jira/browse/FLUME-2500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14171466#comment-14171466
]
Hari Shreedharan commented on FLUME-2500:
-----------------------------------------
Roshan:
There are 2 motivations for this channel:
#. Be able to provide a distributed channel for Flume - we could implement one
internally in Flume - which I would definitely prefer but don't have the cycles
to do, and is better operations-wise or use an existing service (like Hazel
cast used by Ashish or Kafka in this case)
#. Given that a user has a Kafka cluster, be able to use Flume's sources and
sinks to either get data into Kafka from the variety of Flume sources, or write
to HDFS from Kafka using Flume's sinks. The reason a channel works better than
the Kafka source is that a dead flume agent won't affect event delivery - the
events just get routed via another agent.
> Add a channel that uses Kafka
> ------------------------------
>
> Key: FLUME-2500
> URL: https://issues.apache.org/jira/browse/FLUME-2500
> Project: Flume
> Issue Type: Bug
> Reporter: Hari Shreedharan
> Assignee: Hari Shreedharan
>
> Here is the rationale:
> - Kafka does give a HA channel, which means a dead agent does not affect the
> data in the channel - thus reducing delay of delivery.
> - Kafka is used by many companies - it would be a good idea to use Flume to
> pull data from Kafka and write it to HDFS/HBase etc.
> This channel is not going to be useful for cases where Kafka is not already
> used, since it brings is operational overhead of maintaining two systems, but
> if there is Kafka in use - this is good way to integrate Kafka and Flume.
> Here is an a scratch implementation:
> https://github.com/harishreedharan/flume/blob/kafka-channel/flume-ng-channels/flume-kafka-channel/src/main/java/org/apache/flume/channel/kafka/KafkaChannel.java
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)