[
https://issues.apache.org/jira/browse/MAPREDUCE-2153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12925281#action_12925281
]
Ravi Gummadi commented on MAPREDUCE-2153:
-----------------------------------------
Some job specific configuration properties(but not complete) mentioned in
MAPREDUCE-1658 that are to be brought into trace file are:
* mapreduce.map.speculative, mapreduce.reduce.speculative
* mapreduce.job.reduce.slowstart.completedmaps
* mapreduce.task.io.sort.factor, mapreduce.task.io.sort.mb,
mapreduce.map.sort.spill.percent
* mapreduce.reduce.shuffle.connect.timeout,
mapreduce.reduce.shuffle.read.timeout
* mapreduce.reduce.shuffle.merge.percent,
mapreduce.reduce.shuffle.input.buffer.percent
* mapreduce.reduce.merge.inmem.threshold
Resolved MAPREDUCE-1658 as duplicate of this JIRA.
> Bring in more job configuration properties in to the trace file
> ---------------------------------------------------------------
>
> Key: MAPREDUCE-2153
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2153
> Project: Hadoop Map/Reduce
> Issue Type: Improvement
> Components: tools/rumen
> Reporter: Ravi Gummadi
>
> To emulate distributed cache usage in gridmix jobs, there are 9 configuration
> properties needed to be available in trace file:
> (1) mapreduce.job.cache.files
> (2) mapreduce.job.cache.files.visibilities
> (3) mapreduce.job.cache.files.filesizes
> (4) mapreduce.job.cache.files.timestamps
> (5) mapreduce.job.cache.archives
> (6) mapreduce.job.cache.archives.visibilities
> (7) mapreduce.job.cache.archives.filesizes
> (8) mapreduce.job.cache.archives.timestamps
> (9) mapreduce.job.cache.symlink.create
> To emulate data compression in gridmix jobs, trace file should contain the
> following configuration properties:
> (1) mapreduce.map.output.compress
> (2) mapreduce.map.output.compress.codec
> (3) mapreduce.output.fileoutputformat.compress
> (4) mapreduce.output.fileoutputformat.compress.codec
> (5) mapreduce.output.fileoutputformat.compress.type
> Ideally, gridmix should set many job specific configuration properties like
> io.sort.mb, io.sort.factor, etc when running simulated jobs to get the same
> effect of original/real job in terms of spilled records, number of merges,
> etc.
> TraceBuilder should bring in all these properties into the generated trace
> file.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.