[ https://issues.apache.org/jira/browse/CRUNCH-564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14935936#comment-14935936 ]
Muhammad commented on CRUNCH-564: --------------------------------- It appears to work with \ as escape character. Ill update if I face issues. On configuration options - I thought you mandated to provide everything, because if I do not provide CSV_BUFFER_SIZE it crashes with NPE, following is the code snippet that fails. {code} final String bufferValue = this.configuration.get(CSVFileSource.CSV_BUFFER_SIZE); if ("".equals(bufferValue)) { bufferSize = CSVLineReader.DEFAULT_BUFFER_SIZE; } else { bufferSize = Integer.parseInt(bufferValue); } {code} And If I do not provide CSV_INPUT_FILE_ENCODING it crashes also both because {code} this.configuration.get(CSVFileSource.CSV_INPUT_FILE_ENCODING/CSV_BUFFER_SIZE) {code} is returning a *null* and not empty string making it go in the *else* clause.. I'm using {code}org.apache.mrunit:mrunit:1.1.0:hadoop2{code} and {code}org.apache.hadoop:hadoop-mapreduce-client-jobclient:2.6.0{code} > Add support for using escape character same as open/close quote character > ------------------------------------------------------------------------- > > Key: CRUNCH-564 > URL: https://issues.apache.org/jira/browse/CRUNCH-564 > Project: Crunch > Issue Type: Improvement > Components: Core > Reporter: Muhammad > Assignee: Josh Wills > Priority: Trivial > Labels: csv, csvparser > > As a user I would like to use CSVInputFormat to handle the CSV files > following this RFC http://www.ietf.org/rfc/rfc4180.txt. > Many developers use Apache StringEscapeUtils.escapeCsv( ) method to escape > their CSVs. The method escapes the CSV following the RFC4180. > https://commons.apache.org/proper/commons-lang/javadocs/api-2.6/org/apache/commons/lang/StringEscapeUtils.html > The CSVLineReader throws exception in such a case. We can enhance the code to > support the CSVs that use escape same as the quote characters. > https://github.com/apache/crunch/blob/master/crunch-core/src/main/java/org/apache/crunch/io/text/csv/CSVLineReader.java#L152 > I would appreciate a comment, if someone has knowingly rejected the idea due > to some technical limitation or a problem with allowing escape and quote as > same characters. By the way Apache HAWQ seem to get around this issue somehow > and reads such CSVs alright. -- This message was sent by Atlassian JIRA (v6.3.4#6332)