[
https://issues.apache.org/jira/browse/MAPREDUCE-4630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13502836#comment-13502836
]
Harsh J commented on MAPREDUCE-4630:
------------------------------------
bq. If there is an API for input block size
What Vinod pointed to, is not an API for input _block_ size. Its an API for
input _split_ size. Splits are purely an MR concept and that concept works
across different FileSystem implementations, but blocks aren't exactly purely
MR and have more to do with HDFS.
Overriding output file _HDFS_ block sizes can be done by utilizing the provided
raw job configuration object itself. A new API to set a HDFS-specific config
string, that too just as a wrapper around {{conf.set("dfs.blocksize", val)}} or
a {{-Ddfs.blocksize=val}}, doesn't sound too attractive.
It could be done by the MR program writer itself, and in my experience, again,
its override is highly minimal to warrant a new API call.
As Alex says, its easier for you to have a local proxy object with all your
needed setters and getters and use that for your framework instead? It would
give you more flexibility at the same time (to keep adding your desired APIs).
> API for setting dfs.block.size
> ------------------------------
>
> Key: MAPREDUCE-4630
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4630
> Project: Hadoop Map/Reduce
> Issue Type: Improvement
> Environment: Hadoop 2
> Reporter: Radim Kolar
> Priority: Minor
>
> Add API for setting block size in Tool while creating MR job.
> I propose
> FileOutputFormat.setBlockSize(Job job, int blocksize);
> which sets dfs.block.size
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira