[ 
https://issues.apache.org/jira/browse/FLUME-2500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14171444#comment-14171444
 ] 

Roshan Naik commented on FLUME-2500:
------------------------------------

My thoughts ...
- It seems better for clients to write directly to Kafka and bypass flume all 
together in such a case. 
- Data flow seems unnecessarily complex ... data gets pushed out to a remote 
service when going from source -> kafka channel, then bought back to the local 
host when event flows from channel -> sink .. It seems better for data flow to 
be something like....   client -> kafka  (via local flume agent usinga kafka 
sink) and then  some subscriber which pull from Kafka
- Kafka being a remote service, both flume sources & sinks will get coupled to 
intermittent failures when communicating with Kafka (sort of like the jdbc 
channel).

> Add a channel that uses Kafka 
> ------------------------------
>
>                 Key: FLUME-2500
>                 URL: https://issues.apache.org/jira/browse/FLUME-2500
>             Project: Flume
>          Issue Type: Bug
>            Reporter: Hari Shreedharan
>            Assignee: Hari Shreedharan
>
> Here is the rationale:
> - Kafka does give a HA channel, which means a dead agent does not affect the 
> data in the channel - thus reducing delay of delivery.
> - Kafka is used by many companies - it would be a good idea to use Flume to 
> pull data from Kafka and write it to HDFS/HBase etc. 
> This channel is not going to be useful for cases where Kafka is not already 
> used, since it brings is operational overhead of maintaining two systems, but 
> if there is Kafka in use - this is good way to integrate Kafka and Flume.
> Here is an a scratch implementation: 
> https://github.com/harishreedharan/flume/blob/kafka-channel/flume-ng-channels/flume-kafka-channel/src/main/java/org/apache/flume/channel/kafka/KafkaChannel.java



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to