[ https://issues.apache.org/jira/browse/HADOOP-6105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12727143#action_12727143 ]
Philip Zeyliger commented on HADOOP-6105: ----------------------------------------- I'm not enamored of this approach and would like to propose a slightly heavier-weight, but, I think, cleaner approach than stuffing more logic into the Configuration class. My apologies for coming to this conversation a bit late. If you don't want to read a long e-mail, skip down to the code examples at the bottom. :) Before I get to the proposal, I wanted to lay out what I think the goals are. Note that HADOOP-475 is also related. * Standardization of configuration names, documentation, and value formats. Today, the names tend to appear in the code, or, at best, in constants in the code, and the documentation, when it exists, may be in -default.xml. It would be nice if it was very difficult to avoid writing documentation for the variable you're introducing. Right now there are and have been a handful of bugs where the default in the code is different than the default in the XML file, and that gets really confusing. * Backwards compatibility. We'd love to rename "mapred.foo" and "mr.bar" to be consistent, but we want to maintain backwards compatibility. This ticket is all about that. * Availability to user code. Users should be able to use configuration the same way the core does. Users pass information to their jobs via Configuration, and they should use the same mechanism. This is true today. * Type-safety. Configurations have a handful of recurring types: number of bytes, filename, URI, hostport combination, arrays of paths, etc. The parsing is done in an ad-hoc fashion, which is a shame, since it doesn't have to be. It would be nice to have some generic runtime checking of configuration parameters, too, and perhaps even ranges (that number can't be negative!). * Upgradeability to a different configuration format. I don't think we'll leave a place where configuration has to be a key->value map (especially because of "availability to user code", but it would eventually be nice if configuration could be queried from other places, or if the values could have a bit more structure. (For example, we could use XML to separate out a list of paths, instead of blindly using comma-delimited, unescaped text.) * Development ease. It ought to be easier to find the places where configuration is used. Today the best we can do is a grep, and then follow references manually. * Autogenerated documentation. No-brainer. * Ability to specify visibility, scope, and stability. Alogn the lines of HADOOP-5073, configuration variables should be classified as deprecated, unstable, evolving, and stable. It would be nice to introduce variables (say, that were used for tuning), with the expectation that they are not part of the public API. Use at your own risk sort of thing. My proposal is to represent every configuration variable that's accessed in the Hadoop code by a static instance of a ConfigVariable<T> class. The interface is something like: {code} public interface ConfigValue<T> { T get(Configuration conf); T getDefault(); void set(Configuration conf, T value); String getHelp(); } {code} There's more than one way to implement this. Here's one proposal that uses Java annotations: {code} @ConfigDescription(help="Some help text", visibility=Visibility.PUBLIC) @ConfigAccessors({ @ConfigAccessor(name="common.sample"), @ConfigAccessor(name="core.sample", deprecated="Use common.sample instead") }) public final static ConfigVariable<Integer> myConfigVariable = ConfigVariables.newIntConfigVariable(15 /* default value */); {code} This approach would require pre-processing (at build time) the annotations into a data file, and then, at runtime, querying this data file. (It's not easily possible to get at the annotations on the field from within myConfigVariable.) I'm half-way to getting this working, and I actually think something like the following would be better: {code} @ConfigVariableDeclaration public final static ConfigVariable<URI> fsDefaultName = ConfigVariableBuilder.newURI() .setDefault(null) .setHelp("Default filesystem") .setVisibility(Visibility.PUBLIC) .addAccessor("fs.default.name") .addDeprecatedAccessor("core.default.fs", "Use foo instead") .addValidator(new ValidateSupportedFilesystem()); {code} This would still require build-time preprocessing (javac supports this) to find the variables, instantiate them, and output the documentation, but the rest of the processing is easy at runtime. A drawback of this approach is how to handle the defaults that default to other variables. Perhaps the easiest thing to do is to handle the same syntax we support now, like 'addIndirectDefault("${default.dir}/mapred")', but something that references the other variable directly is more appealing, e.g.: 'addIndirectDefault(OtherClass.class, "fieldname")'. I think this can be implemented relatively quickly, with little impact on breaking stuff (because the old way of using Configuration continues to work). What do you think? > Provide a way to automatically handle backward compatibility of deprecated > keys > ------------------------------------------------------------------------------- > > Key: HADOOP-6105 > URL: https://issues.apache.org/jira/browse/HADOOP-6105 > Project: Hadoop Common > Issue Type: Improvement > Components: conf > Reporter: Hemanth Yamijala > > There are cases when we have had to deprecate configuration keys. Use cases > include, changing the names of variables to better match intent, splitting a > single parameter into two - for maps, reduces etc. > In such cases, we typically provide a backwards compatible option for the old > keys. The handling of such cases might typically be common enough to actually > add support for it in a generic fashion in the Configuration class. Some > initial discussion around this started in HADOOP-5919, but since the project > split happened in between we decided to open this issue to fix it in common. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.