[jira] [Commented] (FLUME-2003) It'll be nice if we can control the HDFS block-size and replication for specific HDFS-sink instances

Thiruvalluvan M. G. (JIRA) Tue, 16 Apr 2013 21:49:25 -0700

    [ 
https://issues.apache.org/jira/browse/FLUME-2003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13633781#comment-13633781
 ]


Thiruvalluvan M. G. commented on FLUME-2003:
--------------------------------------------

In my case, we use Flume to collect events whose size varies from very small to 
very large. At present, there is a preprocessing stage which trims the events 
in order to reduce load on HDFS. The pre-processing stage is neither reliable 
nor scalable. We'd like to insert raw data into HDFS and implement 
pre-processing logic as a Hadoop job. So the file created by Flume will get 
consumed very quickly. By this we will exploit scalability offered by Hadoop 
for the pre-processing stage.

This patch essentially exposes a feature offered by HDFS file system API to the 
Flume user. It is backward compatible and hence the current usage can continue. 
It merely allows the Flume user to make a trade-off between reliability and 
performance, if he so wishes. It is not necessary that the user should only 
reduce replication or block size. If desired he can choose a larger block size 
or more replication (at the cost of performance). I don't see a downside. 
Usually new flexibility would mean lower performance, more complex design or 
hard-to maintain code. I don't think any of those is true in this case. In 
other words, this patch gives some benefits to some users with practically no 
additional cost - either to developers or to other users.
                
> It'll be nice if we can control the HDFS block-size and replication for 
> specific HDFS-sink instances
> ----------------------------------------------------------------------------------------------------
>
>                 Key: FLUME-2003
>                 URL: https://issues.apache.org/jira/browse/FLUME-2003
>             Project: Flume
>          Issue Type: Improvement
>          Components: Sinks+Sources
>            Reporter: Thiruvalluvan M. G.
>             Fix For: v1.4.0
>
>         Attachments: FLUME-2003.patch
>
>
> The forthcoming patch provides that functionality.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (FLUME-2003) It'll be nice if we can control the HDFS block-size and replication for specific HDFS-sink instances

Reply via email to