[ 
https://issues.apache.org/jira/browse/SQOOP-1495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Hannam updated SQOOP-1495:
--------------------------------
    Description: 
In {{DelimiterSet}} there is the following comment above two option variables:

{code:java}
// If these next two fields are '\000', then they are ignored.
private char enclosedBy;
private char escapedBy;
{code}

We just found a problem with this whilst doing a Sqoop export, without setting 
the parameters for enclosing or escaping (i.e. they're left as default \000).  
Looking at the code in {{RecordParser}} it appears that although the comment 
says they would be ignored if set to \000 they actually aren't.

For some reason some of the records we're trying to export have \000 in a 
column.  This is fine as long as the \000 isn't just before the delimiter.

This is fine {{foo\000bar|moo}} - two columns are exported.
This isn't fine {{foo\000|bar}} - only one column is exported.

Looking through {{RecordParser}} the problem is that our \000 character is 
being assumed to be an enclosing character, so it's then assuming the delimiter 
is part of a value.  We've set {{enclosedBy}} to be \000 as a default, let's 
ignore it value, but then we're encountering \000 and it's being picked up.

  was:
In {{DelimiterSet}} there is the following comment above two option variables:

{code:java}
// If these next two fields are '\000', then they are ignored.
private char enclosedBy;
private char escapedBy;
{code}

We just found a problem with this whilst doing a Sqoop export.  Looking at the 
code in {{RecordParser}} it appears that although the comment says they would 
be ignored if set to \000 they actually aren't.

For some reason some of the records we're trying to export have \000 in a 
column.  This is fine as long as the \000 isn't just before the delimiter.

This is fine {{foo\000bar|moo}} - two columns are exported.
This isn't fine {{foo\000|bar}} - only one column is exported.

Looking through {{RecordParser}} the problem is that our \000 character is 
being assumed to be an enclosing character, so it's then assuming the delimiter 
is part of a value.  We've set {{enclosedBy}} to be \000 as a default, let's 
ignore it value, but then we're encountering \000 and it's being picked up.


> EnclosedBy and EscapedBy set to \000 are not ignored
> ----------------------------------------------------
>
>                 Key: SQOOP-1495
>                 URL: https://issues.apache.org/jira/browse/SQOOP-1495
>             Project: Sqoop
>          Issue Type: Bug
>    Affects Versions: 1.4.5
>            Reporter: Peter Hannam
>            Priority: Minor
>
> In {{DelimiterSet}} there is the following comment above two option variables:
> {code:java}
> // If these next two fields are '\000', then they are ignored.
> private char enclosedBy;
> private char escapedBy;
> {code}
> We just found a problem with this whilst doing a Sqoop export, without 
> setting the parameters for enclosing or escaping (i.e. they're left as 
> default \000).  Looking at the code in {{RecordParser}} it appears that 
> although the comment says they would be ignored if set to \000 they actually 
> aren't.
> For some reason some of the records we're trying to export have \000 in a 
> column.  This is fine as long as the \000 isn't just before the delimiter.
> This is fine {{foo\000bar|moo}} - two columns are exported.
> This isn't fine {{foo\000|bar}} - only one column is exported.
> Looking through {{RecordParser}} the problem is that our \000 character is 
> being assumed to be an enclosing character, so it's then assuming the 
> delimiter is part of a value.  We've set {{enclosedBy}} to be \000 as a 
> default, let's ignore it value, but then we're encountering \000 and it's 
> being picked up.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to