[
https://issues.apache.org/jira/browse/MAPREDUCE-2151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12925063#action_12925063
]
Hong Tang commented on MAPREDUCE-2151:
--------------------------------------
bq. I think Hong is referring to configuration parameters that are likely to
modify the behaviour of the job and tasks (e.g mapred.child.* , mapreduce.job.*
etc).
No, this is not what this jira intends to solve. But this jira could
potentially help. Currently Rumen extracts from jobconf.xml some key-values
specific to map-reduce layer, and converts them to regular primitive types. I
think the extraction of mapred.child.* and mapreduce.job.* etc should continue
along this path.
However, we start to think of using Rumen output to analyze performance of
frameworks on top of map-reduce. One example is Pig. Pig will add more
information in jobconf.xml to describe the features being used, and
compile-time statistics, We need to have a mechanism in Rumen to retain such
information in an extensible way, and is the primary purpose of this jira.
bq. Also *-default.xml might not be available for reference comparison.
Correct. That is the main reason we have to make each parsed LoggedJob instance
self-contained.
bq. Hmm. But I guess we need to bring in more and more configuration properties
soon.
Yes, it will be, but not unbounded. I think we can support extraction of
properties based on exact match or prefixes.
bq. Created MAPREDUCE-2153 to get other needed configuration properties in to
the trace file.
This seems to be in addition to MAPREDUCE-1658. I suggest you roll two jiras
into one (closing MR-1658 and roll the work int oMR-2153).
bq. Also created MAPREDUCE-2152 for avoiding TraceBuilder's its own handling of
deprecated configuration properties in favour of Configuration object.
The purpose of this jira is to extend the set of key-values to be extracted by
jobconf parser and retain them as-is in LoggedJob object. So I believe your
point is relatively orthogonal to this jira. FWIW, I am a bit concerned to
introduce this dependency between Rumen and MapReduce because I think the
handling deprecated conf parameters is not really a core part of MapReduce API
and could be dropped in the future (which would lead us to move the code into
Rumen - similar to the case of Pre21JobHistoryConstants).
> [rumen] Add a map of jobconf key-value pairs in LoggedJob
> ---------------------------------------------------------
>
> Key: MAPREDUCE-2151
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2151
> Project: Hadoop Map/Reduce
> Issue Type: New Feature
> Components: tools/rumen
> Reporter: Hong Tang
>
> It'd be useful to retain application level configuration settings in
> LoggedJob.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.