[ 
https://issues.apache.org/jira/browse/HADOOP-12118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14724763#comment-14724763
 ] 

Kengo Seki commented on HADOOP-12118:
-------------------------------------

Sorry [~gliptak] for the late response. I agree with the backport because it 
must be useful for the 2.x users. But I couldn't decide whether we should adopt 
xsd for validation or not.
One possibility is, using xsd for the basic structure validation and xpath (for 
example) for the advanced validations I mentioned above to avoid direct xml 
walking, like:

{code}
    XPath xpath = XPathFactory.newInstance().newXPath();
    NodeList nodes = (NodeList) 
xpath.evaluate("/configuration/property/name/text()",
        new InputSource("core-site.xml"), XPathConstants.NODESET);
    Set<String> s = new HashSet<String>();
    for (int i=0; i<nodes.getLength(); i++) {
      String name = nodes.item(i).getTextContent();
      if (!s.add(name)) {
        System.err.println("Found duplicated property: " + name);
      }
    }
{code} 

It will significantly improve code readability and maintainability, but I have 
one concern. Probably it can't report line numbers the problems occurred, 
because DOM doesn't keep elements' position. It is some kind of degradation, 
but fortunately (or unfortunately?) 3.0 is not released yet, it may be an 
acceptable deal for code simplicity at this point.

Thoughts?

> Validate xml configuration files with XML Schema
> ------------------------------------------------
>
>                 Key: HADOOP-12118
>                 URL: https://issues.apache.org/jira/browse/HADOOP-12118
>             Project: Hadoop Common
>          Issue Type: Improvement
>            Reporter: Christopher Tubbs
>         Attachments: HADOOP-7947.branch-2.1.patch, hadoop-configuration.xsd
>
>
> I spent an embarrassingly long time today trying to figure out why the 
> following wouldn't work.
> {code}
> <property>
>   <key>fs.defaultFS</key>
>   <value>hdfs://localhost:9000</value>
> </property>
> {code}
> I just kept getting an error about no authority for {{fs.defaultFS}}, with a 
> value of {{file:///}}, which made no sense... because I knew it was there.
> The problem was that the {{core-site.xml}} was parsed entirely without any 
> validation. This seems incorrect. The very least that could be done is a 
> simple XML Schema validation against an XSD, before parsing. That way, users 
> will get immediate failures on common typos and other problems in the xml 
> configuration files.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to