[ 
https://issues.apache.org/jira/browse/FLUME-2410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jay vyas updated FLUME-2410:
----------------------------

    Description: 
Flume has been designed to support any Hadoop Compatible File System, but hard 
codes semantics for HDFS....

For example: we can build sinks that work well with different hadoop compatible 
file systems. For example, the following will write to glusterfs if the 
glusterfs hadoop plugin is enabled:

{noformat}
        agent.channels.memory-channel.type = memory
        agent.channels.memory-channel.capacity = 2000
        agent.channels.memory-channel.transactionCapacity = 100
        agent.sources.tail-source.type = exec
        agent.sources.tail-source.command = tail -F /tmp/flume-smoke
        agent.sources.tail-source.channels = memory-channel

        agent.sinks.log-sink.channel = memory-channel
        agent.sinks.log-sink.type = logger

        # Define a sink that outputs to the DFS
        agent.sinks.hdfs-sink.channel = memory-channel
        agent.sinks.hdfs-sink.type = hdfs
        agent.sinks.hdfs-sink.hdfs.path = glusterfs:///tmp/flume-test
        agent.sinks.hdfs-sink.hdfs.fileType = DataStream

        # activate the channels/sinks/sources
        agent.channels = memory-channel
        agent.sources = tail-source
        agent.sinks = log-sink hdfs-sink
{noformat}

Similar streams would exist for S3 and so on - as long as the file system is 
configured properly in hadoop (core-site.xml).  

Since these are examples of hadoop compatible file systems - and flume clearly 
supports them - flume should support   {{agent.sinks.HCFS-sink.hcfs.path}} as 
the way of defining these streams, and possibly deprecate the      
{{agent.sinks.hdfs-sink.hdfs.path}} semantics - because it misleads to assume 
that hdfs is the only hadoop compatible file system which flume supports.




  was:
Flume has been designed to support any Hadoop Compatible File System, but hard 
codes semantics for HDFS....

For example: we can build sinks that work well with different hadoop compatible 
file systems. For example, the following will write to glusterfs if the 
glusterfs hadoop plugin is enabled:

{noformat}
        agent.channels.memory-channel.type = memory
        agent.channels.memory-channel.capacity = 2000
        agent.channels.memory-channel.transactionCapacity = 100
        agent.sources.tail-source.type = exec
        agent.sources.tail-source.command = tail -F /tmp/flume-smoke
        agent.sources.tail-source.channels = memory-channel

        agent.sinks.log-sink.channel = memory-channel
        agent.sinks.log-sink.type = logger

        # Define a sink that outputs to the DFS
        agent.sinks.hdfs-sink.channel = memory-channel
        agent.sinks.hdfs-sink.type = hdfs
        agent.sinks.hdfs-sink.hdfs.path = glusterfs:///tmp/flume-test
        agent.sinks.hdfs-sink.hdfs.fileType = DataStream

        # activate the channels/sinks/sources
        agent.channels = memory-channel
        agent.sources = tail-source
        agent.sinks = log-sink hdfs-sink
{noformat}

Similar streams would exist for S3 and so on - as long as the file system is 
configured properly in hadoop (core-site.xml).  

Since these are examples of hadoop compatible file systems - and flume clearly 
supports them - flume should support   {{agent.sinks.hdfs-sink.hcfs.path}} as 
the way of defining these streams, and possibly deprecate the      
{{agent.sinks.hdfs-sink.hdfs.path}} semantics - because it misleads to assume 
that hdfs is the only hadoop compatible file system which flume supports.





> Support HCFS Semantics for sinks/sources (agent.sinks.hdfs-sink.hcfs.path)
> --------------------------------------------------------------------------
>
>                 Key: FLUME-2410
>                 URL: https://issues.apache.org/jira/browse/FLUME-2410
>             Project: Flume
>          Issue Type: Bug
>          Components: Sinks+Sources
>            Reporter: jay vyas
>            Priority: Minor
>
> Flume has been designed to support any Hadoop Compatible File System, but 
> hard codes semantics for HDFS....
> For example: we can build sinks that work well with different hadoop 
> compatible file systems. For example, the following will write to glusterfs 
> if the glusterfs hadoop plugin is enabled:
> {noformat}
>         agent.channels.memory-channel.type = memory
>         agent.channels.memory-channel.capacity = 2000
>         agent.channels.memory-channel.transactionCapacity = 100
>         agent.sources.tail-source.type = exec
>         agent.sources.tail-source.command = tail -F /tmp/flume-smoke
>         agent.sources.tail-source.channels = memory-channel
>         agent.sinks.log-sink.channel = memory-channel
>         agent.sinks.log-sink.type = logger
>         # Define a sink that outputs to the DFS
>         agent.sinks.hdfs-sink.channel = memory-channel
>         agent.sinks.hdfs-sink.type = hdfs
>         agent.sinks.hdfs-sink.hdfs.path = glusterfs:///tmp/flume-test
>         agent.sinks.hdfs-sink.hdfs.fileType = DataStream
>         # activate the channels/sinks/sources
>         agent.channels = memory-channel
>         agent.sources = tail-source
>         agent.sinks = log-sink hdfs-sink
> {noformat}
> Similar streams would exist for S3 and so on - as long as the file system is 
> configured properly in hadoop (core-site.xml).  
> Since these are examples of hadoop compatible file systems - and flume 
> clearly supports them - flume should support   
> {{agent.sinks.HCFS-sink.hcfs.path}} as the way of defining these streams, and 
> possibly deprecate the      {{agent.sinks.hdfs-sink.hdfs.path}} semantics - 
> because it misleads to assume that hdfs is the only hadoop compatible file 
> system which flume supports.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to