[
https://issues.apache.org/jira/browse/FLUME-2003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13633781#comment-13633781
]
Thiruvalluvan M. G. commented on FLUME-2003:
--------------------------------------------
In my case, we use Flume to collect events whose size varies from very small to
very large. At present, there is a preprocessing stage which trims the events
in order to reduce load on HDFS. The pre-processing stage is neither reliable
nor scalable. We'd like to insert raw data into HDFS and implement
pre-processing logic as a Hadoop job. So the file created by Flume will get
consumed very quickly. By this we will exploit scalability offered by Hadoop
for the pre-processing stage.
This patch essentially exposes a feature offered by HDFS file system API to the
Flume user. It is backward compatible and hence the current usage can continue.
It merely allows the Flume user to make a trade-off between reliability and
performance, if he so wishes. It is not necessary that the user should only
reduce replication or block size. If desired he can choose a larger block size
or more replication (at the cost of performance). I don't see a downside.
Usually new flexibility would mean lower performance, more complex design or
hard-to maintain code. I don't think any of those is true in this case. In
other words, this patch gives some benefits to some users with practically no
additional cost - either to developers or to other users.
> It'll be nice if we can control the HDFS block-size and replication for
> specific HDFS-sink instances
> ----------------------------------------------------------------------------------------------------
>
> Key: FLUME-2003
> URL: https://issues.apache.org/jira/browse/FLUME-2003
> Project: Flume
> Issue Type: Improvement
> Components: Sinks+Sources
> Reporter: Thiruvalluvan M. G.
> Fix For: v1.4.0
>
> Attachments: FLUME-2003.patch
>
>
> The forthcoming patch provides that functionality.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira