Streaming: better conrol over input splits
------------------------------------------
Key: HADOOP-2278
URL: https://issues.apache.org/jira/browse/HADOOP-2278
Project: Hadoop
Issue Type: Improvement
Components: contrib/streaming
Reporter: arkady borkovsky
In steaming, the map command usually expect to receive it's input uninterpreted
-- just as it is stored in DFS.
However, the split (the beginning and the end of the portion of data that goes
to a single map task) is often important and is not "any line break".
Often the input consists of multi-line docments -- e.g. in XML.
There should be a way to specify a pattern that separates logical records.
Existing "Streaming XML record reader" kind of provides this functionality.
However, it is accepted that "Streaming XML" is a hack and needs to be replaced
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.