[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13502836#comment-13502836
 ] 

Harsh J commented on MAPREDUCE-4630:
------------------------------------

bq. If there is an API for input block size

What Vinod pointed to, is not an API for input _block_ size. Its an API for 
input _split_ size. Splits are purely an MR concept and that concept works 
across different FileSystem implementations, but blocks aren't exactly purely 
MR and have more to do with HDFS.

Overriding output file _HDFS_ block sizes can be done by utilizing the provided 
raw job configuration object itself. A new API to set a HDFS-specific config 
string, that too just as a wrapper around {{conf.set("dfs.blocksize", val)}} or 
a {{-Ddfs.blocksize=val}}, doesn't sound too attractive.

It could be done by the MR program writer itself, and in my experience, again, 
its override is highly minimal to warrant a new API call.

As Alex says, its easier for you to have a local proxy object with all your 
needed setters and getters and use that for your framework instead? It would 
give you more flexibility at the same time (to keep adding your desired APIs).
                
> API for setting dfs.block.size
> ------------------------------
>
>                 Key: MAPREDUCE-4630
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4630
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>         Environment: Hadoop 2
>            Reporter: Radim Kolar
>            Priority: Minor
>
> Add API for setting block size in Tool while creating MR job.
> I propose
> FileOutputFormat.setBlockSize(Job job, int blocksize);
> which sets dfs.block.size

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to