[ 
https://issues.apache.org/jira/browse/SQOOP-1495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14124387#comment-14124387
 ] 

Peter Hannam edited comment on SQOOP-1495 at 9/6/14 7:55 AM:
-------------------------------------------------------------

Have attached a patch which adds a check that the enclosing and escaping 
parameters aren't 'default' and if they are, to definitely ignore them.


was (Author: petehannam):
diff --git a/src/java/org/apache/sqoop/lib/RecordParser.java 
b/src/java/org/apache/sqoop/lib/RecordParser.java
index 7c29151..377d07b 100644
--- a/src/java/org/apache/sqoop/lib/RecordParser.java
+++ b/src/java/org/apache/sqoop/lib/RecordParser.java
@@ -230,7 +230,10 @@ public RecordParser(final 
com.cloudera.sqoop.lib.DelimiterSet delimitersIn) {
     char recordDelim = delimiters.getLinesTerminatedBy();
     char escapeChar = delimiters.getEscapedBy();
     boolean enclosingRequired = delimiters.isEncloseRequired();
-
+    boolean enclosingAllowed = enclosingChar != 
com.cloudera.sqoop.lib.DelimiterSet.NULL_CHAR;
+    boolean escapeAllowed = escapeChar != 
com.cloudera.sqoop.lib.DelimiterSet.NULL_CHAR;
+    
+    
     for (int pos = 0; pos < len; pos++) {
       curChar = input.get();
       switch (state) {
@@ -242,10 +245,10 @@ public RecordParser(final 
com.cloudera.sqoop.lib.DelimiterSet delimitersIn) {
         }
 
         sb = new StringBuilder();
-        if (enclosingChar == curChar) {
+        if (enclosingAllowed && enclosingChar == curChar) {
           // got an opening encloser.
           state = ParseState.ENCLOSED_FIELD;
-        } else if (escapeChar == curChar) {
+        } else if (escapeAllowed && escapeChar == curChar) {
           state = ParseState.UNENCLOSED_ESCAPE;
         } else if (fieldDelim == curChar) {
           // we have a zero-length field. This is a no-op.
@@ -267,7 +270,7 @@ public RecordParser(final 
com.cloudera.sqoop.lib.DelimiterSet delimitersIn) {
         break;
 
       case ENCLOSED_FIELD:
-        if (escapeChar == curChar) {
+        if (escapeAllowed && escapeChar == curChar) {
           // the next character is escaped. Treat it literally.
           state = ParseState.ENCLOSED_ESCAPE;
         } else if (enclosingChar == curChar) {
@@ -282,7 +285,7 @@ public RecordParser(final 
com.cloudera.sqoop.lib.DelimiterSet delimitersIn) {
         break;
 
       case UNENCLOSED_FIELD:
-        if (escapeChar == curChar) {
+        if (escapeAllowed && escapeChar == curChar) {
           // the next character is escaped. Treat it literally.
           state = ParseState.UNENCLOSED_ESCAPE;
         } else if (fieldDelim == curChar) {
diff --git a/src/test/com/cloudera/sqoop/lib/TestRecordParser.java 
b/src/test/com/cloudera/sqoop/lib/TestRecordParser.java
index 8b11d39..ab76ed5 100644
--- a/src/test/com/cloudera/sqoop/lib/TestRecordParser.java
+++ b/src/test/com/cloudera/sqoop/lib/TestRecordParser.java
@@ -409,5 +409,13 @@ public void testRepeatedParse() throws 
RecordParser.ParseError {
     assertListsEqual(null, list(strings2),
         parser.parseRecord("foo,\"bar\""));
   }
+  
+  public void testTwoFieldsWithQuoteBeforeDelim() throws 
RecordParser.ParseError {
+    char[] input = new char[] {'A', (char) 0, '|', 'B'};
+
+    RecordParser parser = new RecordParser(new DelimiterSet('|', '\n', '\0', 
'\0', false));
+    String[] strings = {"A\u0000", "B"};
+    assertListsEqual(null, list(strings), parser.parseRecord(input));
+  }
 
 }


> EnclosedBy and EscapedBy set to \000 are not ignored
> ----------------------------------------------------
>
>                 Key: SQOOP-1495
>                 URL: https://issues.apache.org/jira/browse/SQOOP-1495
>             Project: Sqoop
>          Issue Type: Bug
>    Affects Versions: 1.4.5
>            Reporter: Peter Hannam
>            Priority: Minor
>         Attachments: patch.diff
>
>
> In {{DelimiterSet}} there is the following comment above two option variables:
> {code:java}
> // If these next two fields are '\000', then they are ignored.
> private char enclosedBy;
> private char escapedBy;
> {code}
> We just found a problem with this whilst doing a Sqoop export, without 
> setting the parameters for enclosing or escaping (i.e. they're left as 
> default \000).  Looking at the code in {{RecordParser}} it appears that 
> although the comment says they would be ignored if set to \000 they actually 
> aren't.
> For some reason some of the records we're trying to export have \000 in a 
> column.  This is fine as long as the \000 isn't just before the delimiter.
> This is fine {{foo\000bar|moo}} - two columns are exported.
> This isn't fine {{foo\000|bar}} - only one column is exported.
> Looking through {{RecordParser}} the problem is that our \000 character is 
> being assumed to be an enclosing character, so it's then assuming the 
> delimiter is part of a value.  We've set {{enclosedBy}} to be \000 as a 
> default, let's ignore it value, but then we're encountering \000 and it's 
> being picked up.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to