[ https://issues.apache.org/jira/browse/MAPREDUCE-2254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Todd Lipcon updated MAPREDUCE-2254: ----------------------------------- Resolution: Fixed Fix Version/s: 0.23.0 Assignee: Ahmed Radwan Release Note: TextInputFormat may now split lines with delimiters other than newline, by specifying a configuration parameter "textinputformat.record.delimiter" Hadoop Flags: [Reviewed] Status: Resolved (was: Patch Available) > Allow setting of end-of-record delimiter for TextInputFormat > ------------------------------------------------------------ > > Key: MAPREDUCE-2254 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-2254 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Reporter: Ahmed Radwan > Assignee: Ahmed Radwan > Fix For: 0.23.0 > > Attachments: MAPREDUCE-2245.patch, MAPREDUCE-2254_r2.patch, > MAPREDUCE-2254_r3.patch > > > It will be useful to allow setting the end-of-record delimiter for > TextInputFormat. The current implementation hardcodes '\n', '\r' or '\r\n' as > the only possible record delimiters. This is a problem if users have embedded > newlines in their data fields (which is pretty common). This is also a > problem for other tools using this TextInputFormat (See for example: > https://issues.apache.org/jira/browse/PIG-836 and > https://issues.cloudera.org/browse/SQOOP-136). > I have wrote a patch to address this issue. This patch allows users to > specify any custom end-of-record delimiter using a new added configuration > property. For backward compatibility, if this new configuration property is > absent, then the same exact previous delimiters are used (i.e., '\n', '\r' or > '\r\n'). -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira