[
https://issues.apache.org/jira/browse/FLUME-2500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14171444#comment-14171444
]
Roshan Naik commented on FLUME-2500:
------------------------------------
My thoughts ...
- It seems better for clients to write directly to Kafka and bypass flume all
together in such a case.
- Data flow seems unnecessarily complex ... data gets pushed out to a remote
service when going from source -> kafka channel, then bought back to the local
host when event flows from channel -> sink .. It seems better for data flow to
be something like.... client -> kafka (via local flume agent usinga kafka
sink) and then some subscriber which pull from Kafka
- Kafka being a remote service, both flume sources & sinks will get coupled to
intermittent failures when communicating with Kafka (sort of like the jdbc
channel).
> Add a channel that uses Kafka
> ------------------------------
>
> Key: FLUME-2500
> URL: https://issues.apache.org/jira/browse/FLUME-2500
> Project: Flume
> Issue Type: Bug
> Reporter: Hari Shreedharan
> Assignee: Hari Shreedharan
>
> Here is the rationale:
> - Kafka does give a HA channel, which means a dead agent does not affect the
> data in the channel - thus reducing delay of delivery.
> - Kafka is used by many companies - it would be a good idea to use Flume to
> pull data from Kafka and write it to HDFS/HBase etc.
> This channel is not going to be useful for cases where Kafka is not already
> used, since it brings is operational overhead of maintaining two systems, but
> if there is Kafka in use - this is good way to integrate Kafka and Flume.
> Here is an a scratch implementation:
> https://github.com/harishreedharan/flume/blob/kafka-channel/flume-ng-channels/flume-kafka-channel/src/main/java/org/apache/flume/channel/kafka/KafkaChannel.java
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)