Christopher Auston created HADOOP-10138:
-------------------------------------------
Summary: Support custom record separator with streaming
Key: HADOOP-10138
URL: https://issues.apache.org/jira/browse/HADOOP-10138
Project: Hadoop Common
Issue Type: Improvement
Reporter: Christopher Auston
Priority: Minor
We store XML documents in sequence files as values. The values may contain
newlines. It is useful to have hadoop-streaming output a zero-byte instead of
a newline to delimit key-value boundaries. The mapping script can then find
the key-value pairs unambiguously by looking for zero-bytes.
I find this really useful so I can use a ruby script for quick adhoc queries.
I have a patch with unit test that I will attach.
--
This message was sent by Atlassian JIRA
(v6.1#6144)