Bring in more job configuration properties in to the trace file
---------------------------------------------------------------
Key: MAPREDUCE-2153
URL: https://issues.apache.org/jira/browse/MAPREDUCE-2153
Project: Hadoop Map/Reduce
Issue Type: Improvement
Components: tools/rumen
Reporter: Ravi Gummadi
To emulate distributed cache usage in gridmix jobs, there are 9 configuration
properties needed to be available in trace file:
(1) mapreduce.job.cache.files
(2) mapreduce.job.cache.files.visibilities
(3) mapreduce.job.cache.files.filesizes
(4) mapreduce.job.cache.files.timestamps
(5) mapreduce.job.cache.archives
(6) mapreduce.job.cache.archives.visibilities
(7) mapreduce.job.cache.archives.filesizes
(8) mapreduce.job.cache.archives.timestamps
(9) mapreduce.job.cache.symlink.create
To emulate data compression in gridmix jobs, trace file should contain the
following configuration properties:
(1) mapreduce.map.output.compress
(2) mapreduce.map.output.compress.codec
(3) mapreduce.output.fileoutputformat.compress
(4) mapreduce.output.fileoutputformat.compress.codec
(5) mapreduce.output.fileoutputformat.compress.type
Ideally, gridmix should set many job specific configuration properties like
io.sort.mb, io.sort.factor, etc when running simulated jobs to get the same
effect of original/real job in terms of spilled records, number of merges, etc.
TraceBuilder should bring in all these properties into the generated trace file.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.