Streaming "slowmatch" documentation
-----------------------------------

                 Key: HADOOP-3680
                 URL: https://issues.apache.org/jira/browse/HADOOP-3680
             Project: Hadoop Core
          Issue Type: Bug
          Components: contrib/streaming
    Affects Versions: 0.17.0
            Reporter: Bo Adler
            Priority: Trivial


The documentation for the Streaming module do not include any mention of the 
"slowmatch" parameter, which checks for CDATA sections while looking for XML 
records.

An important point is that "slowmatch=true" violates the principle of least 
surprise: the "begin" and "end" parameters become regular expressions instead 
of exact strings.  This is probably a useful feature, but should definitely be 
noted since users will be tempted to use the XML record reader on 
not-strictly-xml files, which may require escaping the "begin" and "end" 
patterns.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to