[ http://issues.apache.org/jira/browse/HADOOP-439?page=all ]
Sameer Paranjpye reassigned HADOOP-439: --------------------------------------- Assignee: Michel Tourn > Streaming does not work for text data if the records don't fit in a short > UTF8 [2^16/3 characters] > -------------------------------------------------------------------------------------------------- > > Key: HADOOP-439 > URL: http://issues.apache.org/jira/browse/HADOOP-439 > Project: Hadoop > Issue Type: Bug > Affects Versions: 0.5.0 > Reporter: Dick King > Assigned To: Michel Tourn > Priority: Critical > Fix For: 0.6.0 > > > The streaming code internally reads the input data into a UTF8 . This causes > truncated data to be shipped to the mapper when the input exceeds about 21000 > characters, with no notice to the user except possibly in individual tasks' > machines' logs, which people would not normally read for apparently > successful jobs. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira