[
https://issues.apache.org/jira/browse/SQOOP-3480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Samet Karadag updated SQOOP-3480:
---------------------------------
Description:
if enclosed-by and escaped-by characters are both double quote (\"). This
causes duplicate escapes and thus duplicate characters in douple quotes.
Example;
gcloud dataproc jobs submit hadoop --cluster=sqoop --region=europe-west4
--class=org.apache.sqoop.Sqoop --jars=$libs – import
-Dmapreduce.job.user.classpath.first=true --connect=jdbc:****
--target-dir=gs://my-oracle-extract/EMPLOYEES --table=HR.EMPLOYEES
--enclosed-by '\"' --escaped-by \" --fields-terminated-by '|' --null-string ''
--null-non-string '' --as-textfile
causes this field; <test field " >
to enclosed and escaped by this; <"test field """"">
Which has 2 double quotes
Bigquery requires double quotes as escap char. and field should be also
enclosed by " for newlines.
code should be change;
in FieldFormatter.java;
if (escapingLegal)
{ // escaping is legal. Escape any instances of the escape char itself.
withEscapes = str.replace("" + escape, "" + escape + escape); }
else
{ // no need to double-escape withEscapes = str; }
// if we have an enclosing character, and escaping is legal, then the
// encloser must always be escaped.
if (escapingLegal) \{ withEscapes = withEscapes.replace("" + enclose, "" +
escape + enclose); }
to this;
if (escapingLegal ) \{ // escaping is legal. Escape any instances of the
escape char itself. withEscapes = str.replace("" + escape, "" + escape +
escape); alreadyEscaped = true } else \{ // no need to double-escape
withEscapes = str; }
// if we have an enclosing character, and escaping is legal, then the
// encloser must always be escaped.
if (escapingLegal and enclose!=escape)
{ withEscapes = withEscapes.replace("" + enclose, "" + escape + enclose); }
was:
if enclosed-by and escaped-by characters are both double quote (\"). This
causes duplicate escapes and thus duplicate characters in douple quotes.
Example;
gcloud dataproc jobs submit hadoop --cluster=sqoop --region=europe-west4
--class=org.apache.sqoop.Sqoop --jars=$libs – import
-Dmapreduce.job.user.classpath.first=true --connect=jdbc:****
--target-dir=gs://my-oracle-extract/EMPLOYEES --table=HR.EMPLOYEES
--enclosed-by '\"' --escaped-by \" --fields-terminated-by '|' --null-string ''
--null-non-string '' --as-textfile
causes this field; <test field " >
to enclosed and escaped by this; <"test field """"">
Which has 2 double quotes
Bigquery requires double quotes as escap char. and field should be also
enclosed by " for newlines.
code should be change;
in FieldFormatter.java;
if (escapingLegal) {
// escaping is legal. Escape any instances of the escape char itself.
withEscapes = str.replace("" + escape, "" + escape + escape);
} else {
// no need to double-escape
withEscapes = str;
}
// if we have an enclosing character, and escaping is legal, then the
// encloser must always be escaped.
if (escapingLegal) {
withEscapes = withEscapes.replace("" + enclose, "" + escape + enclose);
}
to this;
boolean alreadyEscaped=false
if (escapingLegal and !alreadyEscaped) {
// escaping is legal. Escape any instances of the escape char itself.
withEscapes = str.replace("" + escape, "" + escape + escape);
alreadyEscaped = true
} else {
// no need to double-escape
withEscapes = str;
}
// if we have an enclosing character, and escaping is legal, then the
// encloser must always be escaped.
if (escapingLegal and !alreadyEscaped) {
withEscapes = withEscapes.replace("" + enclose, "" + escape + enclose);
}
> if enclosed-by and escaped-by characters are both double quote (\"). This
> causes duplicate escapes and thus duplicate characters in douplequotes
> ------------------------------------------------------------------------------------------------------------------------------------------------
>
> Key: SQOOP-3480
> URL: https://issues.apache.org/jira/browse/SQOOP-3480
> Project: Sqoop
> Issue Type: Bug
> Reporter: Samet Karadag
> Priority: Blocker
>
> if enclosed-by and escaped-by characters are both double quote (\"). This
> causes duplicate escapes and thus duplicate characters in douple quotes.
> Example;
> gcloud dataproc jobs submit hadoop --cluster=sqoop --region=europe-west4
> --class=org.apache.sqoop.Sqoop --jars=$libs – import
> -Dmapreduce.job.user.classpath.first=true --connect=jdbc:****
> --target-dir=gs://my-oracle-extract/EMPLOYEES --table=HR.EMPLOYEES
> --enclosed-by '\"' --escaped-by \" --fields-terminated-by '|' --null-string
> '' --null-non-string '' --as-textfile
>
> causes this field; <test field " >
> to enclosed and escaped by this; <"test field """"">
> Which has 2 double quotes
> Bigquery requires double quotes as escap char. and field should be also
> enclosed by " for newlines.
>
> code should be change;
> in FieldFormatter.java;
> if (escapingLegal)
> { // escaping is legal. Escape any instances of the escape char itself.
> withEscapes = str.replace("" + escape, "" + escape + escape); }
> else
> { // no need to double-escape withEscapes = str; }
>
> // if we have an enclosing character, and escaping is legal, then the
> // encloser must always be escaped.
> if (escapingLegal) \{ withEscapes = withEscapes.replace("" + enclose, "" +
> escape + enclose); }
>
>
>
> to this;
>
>
> if (escapingLegal ) \{ // escaping is legal. Escape any instances of the
> escape char itself. withEscapes = str.replace("" + escape, "" + escape +
> escape); alreadyEscaped = true } else \{ // no need to double-escape
> withEscapes = str; }
> // if we have an enclosing character, and escaping is legal, then the
> // encloser must always be escaped.
> if (escapingLegal and enclose!=escape)
> { withEscapes = withEscapes.replace("" + enclose, "" + escape + enclose); }
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)