[ https://issues.apache.org/jira/browse/BIGTOP-1689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14360207#comment-14360207 ]
Michael Weiser commented on BIGTOP-1689: ---------------------------------------- I had a look at it but haven't actually tested it. From what I can see the implementation is fine and a lot of effort seems to have gone into it. So I'd hate to shoot it down. Reparsing the XML document certainly has the advantage that the settings can be merged properly and duplicate entries be avoided. However, only XML seems to be supported at the moment. Support for other file formats such key=value (*.cfg, hadoop-env.sh) would need additional functions and would quickly lead to a need of some sort of plug-in system. Also, the custom parser function somewhat obfuscates how this logic works and makes it less accessible to other developers for future extensions. The alternative of generating the whole config file from a hash would be very intrusive and would need to handle cases where currently we insert certain parts of the config only if some condition is met and values are generated e.g. from arrays using some sort of concatenation or looping over it and inserting multiple settings at once. I have to admit, I do not know if the Hadoop XML configuration file syntax actually requires each setting to be given only once or if it supports a parse order for overriding the same setting within the same file. It looks like XML so assuming the former is natural but not necessarily true. Can an actual Hadoop expert chime in here? [~cos]? Also, we should consider if it's actually necessary to allow *overriding* of settings or if it would be enough to allow *adding* of additional settings. We already have a logic to adjust settings we currently put into the configs. If there are static values, then those could and should be eliminated by either removing them and using the default, adding a new setting as an explicit class variable or implementing the additional option logic discussed here and adding that option as a default value there. If approached that way, adding of addtional settings should be enough and the need for merging and overriding disappears. Without actually having tried it, I think it should be possible to implement a hybrid approach of keeping the current templates but adding a footer that generates additional config entries from the override hash using something like this: {noformat} <% @override_hash.each do |setting,value| -%> <property> <name><%= setting %></name> <value><%= value %></value> </property> <% end -%> {noformat} This could be adjusted for different file formats as necessary. Duplicate settings (the hash overriding the template) would need to be handled by the config file parser of the respective Hadoop component. This would need to be investigated for each config file format. As, said, for the most commonly used XML format I'm not sure if and how this will work. In the worst case, trying to override a setting would lead to undefined behaviour by the configuration parser of the software. I don't think there's an easy way to prevent that. An error message and failure to start from the software would IMO be acceptable. To avoid duplicating that code in each template, it would be great if it could be recursively included from a separate fragment. However, from a quick look at the documentation, Puppet doesn't seem to allow that. It does allow to specify multiple template files to the template function however. So something like this should work: {noformat} template('hdfs-site.xml', 'override.xml') {noformat} The only disadvantage here is that for XML files the closing "</configuration>" tag would need to move from the main template into the footer leaving two non-XML-compliant template fragments. It might also be worth a look how perhaps the puppetlabs concat or other modules can help implement this. I'd rather have a dependency on a standard module everyone has installed already anyways (such as concat) than any kind of special logic. [~petersla]: What do you think? Am I too naive in my approach to this? > puppet: Allow merging arbitrary site configuration > -------------------------------------------------- > > Key: BIGTOP-1689 > URL: https://issues.apache.org/jira/browse/BIGTOP-1689 > Project: Bigtop > Issue Type: Improvement > Components: deployment > Affects Versions: 0.8.0 > Reporter: Peter Slawski > Assignee: Peter Slawski > Fix For: 0.9.0 > > Attachments: BIGTOP-1689.1.patch > > > Puppet should be flexible in allowing arbitrary configuration name value > pairs to be merged into a given site.xml file that was generated from a > template. > For example, the following could be included in site.yaml which would add a > configuration entry for hadoop.tmp.dir in core-site.xml: > {code} > hadoop::common_hdfs::hadoop_core_site_overrides: > "hadoop.tmp.dir": "/mnt/var/lib/hadoop/tmp" > {code} > This could be implemented as a puppet custom-function taking in the output of > template: > {code} > file { > "/etc/hadoop/conf/core-site.xml": > content => merge_site(template('hadoop/core-site.xml'), > $hadoop_core_site_overrides) > require => [Package["hadoop"]], > } > {code} > Perhaps another approach would be to have site.xml templates be created from > a single map of name value pairs. The merge would happen before the file > content is generated from the template. -- This message was sent by Atlassian JIRA (v6.3.4#6332)