Streaming "slowmatch" documentation
-----------------------------------
Key: HADOOP-3680
URL: https://issues.apache.org/jira/browse/HADOOP-3680
Project: Hadoop Core
Issue Type: Bug
Components: contrib/streaming
Affects Versions: 0.17.0
Reporter: Bo Adler
Priority: Trivial
The documentation for the Streaming module do not include any mention of the
"slowmatch" parameter, which checks for CDATA sections while looking for XML
records.
An important point is that "slowmatch=true" violates the principle of least
surprise: the "begin" and "end" parameters become regular expressions instead
of exact strings. This is probably a useful feature, but should definitely be
noted since users will be tempted to use the XML record reader on
not-strictly-xml files, which may require escaping the "begin" and "end"
patterns.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.