Streaming: better conrol over input splits ------------------------------------------
Key: HADOOP-2278 URL: https://issues.apache.org/jira/browse/HADOOP-2278 Project: Hadoop Issue Type: Improvement Components: contrib/streaming Reporter: arkady borkovsky In steaming, the map command usually expect to receive it's input uninterpreted -- just as it is stored in DFS. However, the split (the beginning and the end of the portion of data that goes to a single map task) is often important and is not "any line break". Often the input consists of multi-line docments -- e.g. in XML. There should be a way to specify a pattern that separates logical records. Existing "Streaming XML record reader" kind of provides this functionality. However, it is accepted that "Streaming XML" is a hack and needs to be replaced -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.