[
https://issues.apache.org/jira/browse/HCATALOG-506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Greg Malewicz updated HCATALOG-506:
-----------------------------------
Description: Allow the user to specify the desired number of input splits
through a new configuration parameter hcatalog.desiredNumInputSplits. Two
existing parameters may also need to be specified: mapred.min.split.size and
mapred.max.split.size. This is useful when there are few but large input files
that we want to split into many splits, so as to enhance the parallelizm of
loading the splits. (was: Allow user to specify the desired number of input
splits through a new configuration parameter hcatalog.desiredNumInputSplits.
Two existing parameters may also need to be specified: mapred.min.split.size
and mapred.max.split.size. This is useful when there are few but large input
files that we want to split into many splits, so as to enhance the parallelizm
of loading the splits.)
> desired number of input splits for large files
> ----------------------------------------------
>
> Key: HCATALOG-506
> URL: https://issues.apache.org/jira/browse/HCATALOG-506
> Project: HCatalog
> Issue Type: Improvement
> Affects Versions: 0.4
> Reporter: Greg Malewicz
> Labels: performance
> Attachments: HCATALOG-506.patch
>
> Original Estimate: 1h
> Remaining Estimate: 1h
>
> Allow the user to specify the desired number of input splits through a new
> configuration parameter hcatalog.desiredNumInputSplits. Two existing
> parameters may also need to be specified: mapred.min.split.size and
> mapred.max.split.size. This is useful when there are few but large input
> files that we want to split into many splits, so as to enhance the
> parallelizm of loading the splits.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira