[
https://issues.apache.org/jira/browse/HADOOP-2866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12570982#action_12570982
]
Aaron Kimball commented on HADOOP-2866:
---------------------------------------
Joydeep,
You're definitely correct.
But in general, there are several problems with the JobConf system from a
software engineering point of view:
1) Naming conventions don't exist. foo.bar.camelBaz, foo.bar.noncamelbaz, and
foo.bar.dots.between.each.word are all used
2) The hierarchy imposed by the keys in the JobConfs have nothing to do with
which modules actually use them. Two isolated modules can both depend on the
same key for arbitrarily different functionality, tying one another together --
and no system exists to prevent this.
3) The hierarchy is arbitrarily ignored: why does "map.input.file" exist, when
there is already an established "mapred.map" hierarchy? What is the difference
between "hadoop.job".\* and "job.\*" ? Shouldn't everything in the entire
system technically be hadoop.\* ?
4) Most config options are hardcoded throughout the source as raw strings; they
are not placed in public static final Strings at the head of the dependent
class, nor are they "registered" in any way with JobConf.
I think that a major refactoring of JobConf & friends is probably necessary to
address all these issues. Furthermore, coding standards need to address
formatting and hierarchy of config strings and approach this from the human
side.
So for starters we can:
1) Add this mechanism, which at the very least will catch typos in user
configurations
2) Encourage people who commit user patches to develop and enforce guidelines
for naming conventions
3) Encourage people who commit user patches to require that patches update the
JobConfValidator if they deprecate key names.
And longer-term, I may file another JIRA to address the rest of this.
> JobConf should validate key names in well-defined namespaces and warn on
> misspelling
> ------------------------------------------------------------------------------------
>
> Key: HADOOP-2866
> URL: https://issues.apache.org/jira/browse/HADOOP-2866
> Project: Hadoop Core
> Issue Type: Improvement
> Components: mapred
> Affects Versions: 0.16.0
> Reporter: Aaron Kimball
> Priority: Minor
> Fix For: 0.16.1, 0.17.0
>
> Original Estimate: 72h
> Remaining Estimate: 72h
>
> A discussion on the mailing list reveals that some configuration strings in
> the JobConf are deprecated over time and new configuration names replace them:
> e.g., "mapred.output.compression.type" is now replaced with
> "mapred.map.output.compression.type"
> Programmers who have been manually specifying the former string, however,
> receive no diagnostic output during testing to suggest that their compression
> type is being silently ignored.
> It would be desirable to notify developers of this change by printing a
> warning message when deprecated configuration names are used in a newer
> version of Hadoop. More generally, when any configuration string in the
> mapred.\*, fs.\*, dfs.\*, etc namespaces are provided by a user and are not
> recognized by Hadoop, it is desirable to print a warning, to indicate
> malformed configurations. No warnings should be printed when configuration
> keys are in user-defined namespaces (e.g., "myprogram.mytask.myvalue").
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.