[ 
https://issues.apache.org/jira/browse/AIRFLOW-2053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16349967#comment-16349967
 ] 

ASF subversion and git services commented on AIRFLOW-2053:
----------------------------------------------------------

Commit fd4360b9f0954b3dd4a960153178a06112f05a33 in incubator-airflow's branch 
refs/heads/master from [~kaxilnaik]
[ https://git-wip-us.apache.org/repos/asf?p=incubator-airflow.git;h=fd4360b ]

[AIRFLOW-2053] Fix quote character bug in BQ hook

Modified the condition to check if the
quote_character is set. This will allow to set
`quote_character` as empty string when the data
doesn't contain quoted sections.

Closes #2996 from kaxil/bq_hook_quote_fix


> BigQuery Hook bug when data doesn't contain quoted values
> ---------------------------------------------------------
>
>                 Key: AIRFLOW-2053
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-2053
>             Project: Apache Airflow
>          Issue Type: Bug
>          Components: gcp
>    Affects Versions: 1.9.0, 1.8.2
>            Reporter: Kaxil Naik
>            Assignee: Kaxil Naik
>            Priority: Minor
>             Fix For: 2.0.0
>
>
> The BigQuery API states 
> [here|https://cloud.google.com/bigquery/docs/reference/rest/v2/jobs#configuration.load.quote]
>  that :
> {quote}The value that is used to quote data sections in a CSV file. BigQuery 
> converts the string to ISO-8859-1 encoding, and then uses the first byte of 
> the encoded string to split the data in its raw, binary state. The default 
> value is a double-quote ('"'). If your data does not contain quoted sections, 
> set the property value to an empty string. {quote}
> But the [current 
> implementation|https://github.com/apache/incubator-airflow/blob/6ee4bbd4b1bc4b3f275f7946e2bcdd123970e2dd/airflow/contrib/hooks/bigquery_hook.py#L802]
>  `run_load ` in BigQuery hook has incorrect check to include 
> `quote_character`.
> The code currently is:
> {code:python}
>         if 'fieldDelimiter' not in src_fmt_configs:
>             src_fmt_configs['fieldDelimiter'] = field_delimiter
>         if quote_character:
>             src_fmt_configs['quote'] = quote_character
>         if allow_quoted_newlines:
>             src_fmt_configs['allowQuotedNewlines'] = allow_quoted_newlines
> {code}
> If my data doesn't have quote characters as per BQ API docs I need to put 
> `quote=''` i.e empty string. The above condition `if quote_character:` will 
> return false for an empty string. Hence, I get the following error:
> {code:json}
> {'message': 'Error detected while parsing row starting at position: 0. Error: 
> Data between close double quote (") and field separator.', 'reason': 
> 'invalid'}
> {code}
> So, the condition should be :
> {code:python}
>         if quote_character is not None:
>             src_fmt_configs['quote'] = quote_character
> {code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to