[ https://issues.apache.org/jira/browse/MAPREDUCE-7309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Peter Bacsko updated MAPREDUCE-7309: ------------------------------------ Hadoop Flags: Reviewed Resolution: Fixed Status: Resolved (was: Patch Available) > Improve performance of reading resource request for mapper/reducers from > config > ------------------------------------------------------------------------------- > > Key: MAPREDUCE-7309 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-7309 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: applicationmaster > Affects Versions: 3.0.0, 3.1.0, 3.2.0, 3.3.0 > Reporter: Wangda Tan > Assignee: Peter Bacsko > Priority: Major > Fix For: 3.2.2, 3.4.0, 3.1.5, 3.3.1 > > Attachments: MAPREDUCE-7309-003.patch, MAPREDUCE-7309-004.patch, > MAPREDUCE-7309-005.patch, MAPREDUCE-7309-branch-3.1-001.patch, > MAPREDUCE-7309-branch-3.2-001.patch, MAPREDUCE-7309-branch-3.3-001.patch, > MAPREDUCE-7309.001.patch, MAPREDUCE-7309.002.patch > > > This is an issue could affect all the releases which includes YARN-6927. > Basically, we use regex match repeatedly when we read mapper/reducer resource > request from config files. When we have large config file, and large number > of splits, it could take a long time. > We saw AM could take hours to parse config when we have 200k+ splits, with a > large config file (hundreds of kbs). > The problamtic part is this: > {noformat} > private void populateResourceCapability(TaskType taskType) { > String resourceTypePrefix = > getResourceTypePrefix(taskType); > boolean memorySet = false; > boolean cpuVcoresSet = false; > if (resourceTypePrefix != null) { > List<ResourceInformation> resourceRequests = > ResourceUtils.getRequestedResourcesFromConfig(conf, > resourceTypePrefix); > {noformat} > Inside {{ResourceUtils.getRequestedResourcesFromConfig()}}, we call > {{Configuration.getValByRegex()}} which goes through all property keys that > come from the MapReduce job configuration (jobconf.xml). If the job config is > large (eg. due to being part of an MR pipeline and it was populated by an > earlier job), then this results in running a regexp match unnecessarily for > all properties over and over again. This is not necessary, because all > mappers and reducers will have the same config, respectively. > We should do proper caching for pre-configured resource requests. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org