[
https://issues.apache.org/jira/browse/FLUME-2410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
jay vyas updated FLUME-2410:
----------------------------
Description:
Flume has been designed to support any Hadoop Compatible File System, but hard
codes semantics for HDFS....
For example: we can build sinks that work well with different hadoop compatible
file systems. For example, the following will write to glusterfs if the
glusterfs hadoop plugin is enabled:
{noformat}
agent.channels.memory-channel.type = memory
agent.channels.memory-channel.capacity = 2000
agent.channels.memory-channel.transactionCapacity = 100
agent.sources.tail-source.type = exec
agent.sources.tail-source.command = tail -F /tmp/flume-smoke
agent.sources.tail-source.channels = memory-channel
agent.sinks.log-sink.channel = memory-channel
agent.sinks.log-sink.type = logger
# Define a sink that outputs to the DFS
agent.sinks.hdfs-sink.channel = memory-channel
agent.sinks.hdfs-sink.type = hdfs
agent.sinks.hdfs-sink.hdfs.path = glusterfs:///tmp/flume-test
agent.sinks.hdfs-sink.hdfs.fileType = DataStream
# activate the channels/sinks/sources
agent.channels = memory-channel
agent.sources = tail-source
agent.sinks = log-sink hdfs-sink
{noformat}
Similar streams would exist for S3 and so on - as long as the file system is
configured properly in hadoop (core-site.xml).
Since these are examples of hadoop compatible file systems - and flume clearly
supports them - flume should support {{agent.sinks.HCFS-sink.hcfs.path}} as
the way of defining these streams, and possibly deprecate the
{{agent.sinks.hdfs-sink.hdfs.path}} semantics - because it misleads to assume
that hdfs is the only hadoop compatible file system which flume supports.
was:
Flume has been designed to support any Hadoop Compatible File System, but hard
codes semantics for HDFS....
For example: we can build sinks that work well with different hadoop compatible
file systems. For example, the following will write to glusterfs if the
glusterfs hadoop plugin is enabled:
{noformat}
agent.channels.memory-channel.type = memory
agent.channels.memory-channel.capacity = 2000
agent.channels.memory-channel.transactionCapacity = 100
agent.sources.tail-source.type = exec
agent.sources.tail-source.command = tail -F /tmp/flume-smoke
agent.sources.tail-source.channels = memory-channel
agent.sinks.log-sink.channel = memory-channel
agent.sinks.log-sink.type = logger
# Define a sink that outputs to the DFS
agent.sinks.hdfs-sink.channel = memory-channel
agent.sinks.hdfs-sink.type = hdfs
agent.sinks.hdfs-sink.hdfs.path = glusterfs:///tmp/flume-test
agent.sinks.hdfs-sink.hdfs.fileType = DataStream
# activate the channels/sinks/sources
agent.channels = memory-channel
agent.sources = tail-source
agent.sinks = log-sink hdfs-sink
{noformat}
Similar streams would exist for S3 and so on - as long as the file system is
configured properly in hadoop (core-site.xml).
Since these are examples of hadoop compatible file systems - and flume clearly
supports them - flume should support {{agent.sinks.hdfs-sink.hcfs.path}} as
the way of defining these streams, and possibly deprecate the
{{agent.sinks.hdfs-sink.hdfs.path}} semantics - because it misleads to assume
that hdfs is the only hadoop compatible file system which flume supports.
> Support HCFS Semantics for sinks/sources (agent.sinks.hdfs-sink.hcfs.path)
> --------------------------------------------------------------------------
>
> Key: FLUME-2410
> URL: https://issues.apache.org/jira/browse/FLUME-2410
> Project: Flume
> Issue Type: Bug
> Components: Sinks+Sources
> Reporter: jay vyas
> Priority: Minor
>
> Flume has been designed to support any Hadoop Compatible File System, but
> hard codes semantics for HDFS....
> For example: we can build sinks that work well with different hadoop
> compatible file systems. For example, the following will write to glusterfs
> if the glusterfs hadoop plugin is enabled:
> {noformat}
> agent.channels.memory-channel.type = memory
> agent.channels.memory-channel.capacity = 2000
> agent.channels.memory-channel.transactionCapacity = 100
> agent.sources.tail-source.type = exec
> agent.sources.tail-source.command = tail -F /tmp/flume-smoke
> agent.sources.tail-source.channels = memory-channel
> agent.sinks.log-sink.channel = memory-channel
> agent.sinks.log-sink.type = logger
> # Define a sink that outputs to the DFS
> agent.sinks.hdfs-sink.channel = memory-channel
> agent.sinks.hdfs-sink.type = hdfs
> agent.sinks.hdfs-sink.hdfs.path = glusterfs:///tmp/flume-test
> agent.sinks.hdfs-sink.hdfs.fileType = DataStream
> # activate the channels/sinks/sources
> agent.channels = memory-channel
> agent.sources = tail-source
> agent.sinks = log-sink hdfs-sink
> {noformat}
> Similar streams would exist for S3 and so on - as long as the file system is
> configured properly in hadoop (core-site.xml).
> Since these are examples of hadoop compatible file systems - and flume
> clearly supports them - flume should support
> {{agent.sinks.HCFS-sink.hcfs.path}} as the way of defining these streams, and
> possibly deprecate the {{agent.sinks.hdfs-sink.hdfs.path}} semantics -
> because it misleads to assume that hdfs is the only hadoop compatible file
> system which flume supports.
--
This message was sent by Atlassian JIRA
(v6.2#6252)