Bring in more job configuration properties in to the trace file
---------------------------------------------------------------

                 Key: MAPREDUCE-2153
                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2153
             Project: Hadoop Map/Reduce
          Issue Type: Improvement
          Components: tools/rumen
            Reporter: Ravi Gummadi


To emulate distributed cache usage in gridmix jobs, there are 9 configuration 
properties needed to be available in trace file: 
(1) mapreduce.job.cache.files
(2) mapreduce.job.cache.files.visibilities
(3) mapreduce.job.cache.files.filesizes
(4) mapreduce.job.cache.files.timestamps

(5) mapreduce.job.cache.archives
(6) mapreduce.job.cache.archives.visibilities
(7) mapreduce.job.cache.archives.filesizes
(8) mapreduce.job.cache.archives.timestamps

(9) mapreduce.job.cache.symlink.create

To emulate data compression in gridmix jobs, trace file should contain the 
following configuration properties:
(1) mapreduce.map.output.compress
(2) mapreduce.map.output.compress.codec
(3) mapreduce.output.fileoutputformat.compress
(4) mapreduce.output.fileoutputformat.compress.codec
(5) mapreduce.output.fileoutputformat.compress.type

Ideally, gridmix should set many job specific configuration properties like 
io.sort.mb, io.sort.factor, etc when running simulated jobs to get the same 
effect of original/real job in terms of spilled records, number of merges, etc.

TraceBuilder should bring in all these properties into the generated trace file.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to