[
https://issues.apache.org/jira/browse/AIRFLOW-2053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Kaxil Naik updated AIRFLOW-2053:
--------------------------------
Description:
The BigQuery API states
[here|https://cloud.google.com/bigquery/docs/reference/rest/v2/jobs#configuration.load.quote]
that :
{quote}The value that is used to quote data sections in a CSV file. BigQuery
converts the string to ISO-8859-1 encoding, and then uses the first byte of the
encoded string to split the data in its raw, binary state. The default value is
a double-quote ('"'). If your data does not contain quoted sections, set the
property value to an empty string. {quote}
But the [current
implementation|https://github.com/apache/incubator-airflow/blob/6ee4bbd4b1bc4b3f275f7946e2bcdd123970e2dd/airflow/contrib/hooks/bigquery_hook.py#L802]
`run_load ` in BigQuery hook has incorrect check to include `quote_character`.
The code currently is:
{code:python}
if 'fieldDelimiter' not in src_fmt_configs:
src_fmt_configs['fieldDelimiter'] = field_delimiter
if quote_character:
src_fmt_configs['quote'] = quote_character
if allow_quoted_newlines:
src_fmt_configs['allowQuotedNewlines'] = allow_quoted_newlines
{code}
If my data doesn't have quote characters as per BQ API docs I need to put
`quote=''` i.e empty string. The above condition `if quote_character:` will
return false for an empty string. Hence, I get the following error:
{code:json}
{'message': 'Error detected while parsing row starting at position: 0. Error:
Data between close double quote (") and field separator.', 'reason': 'invalid'}
{code}
So, the condition should be :
{code:python}
if quote_character is not None:
src_fmt_configs['quote'] = quote_character
{code}
was:
The BigQuery API states
[here|https://cloud.google.com/bigquery/docs/reference/rest/v2/jobs] that :
{quote}The value that is used to quote data sections in a CSV file. BigQuery
converts the string to ISO-8859-1 encoding, and then uses the first byte of the
encoded string to split the data in its raw, binary state. The default value is
a double-quote ('"'). If your data does not contain quoted sections, set the
property value to an empty string. {quote}
But the [current
implementation|https://github.com/apache/incubator-airflow/blob/6ee4bbd4b1bc4b3f275f7946e2bcdd123970e2dd/airflow/contrib/hooks/bigquery_hook.py#L802]
`run_load ` in BigQuery hook has incorrect check to include `quote_character`.
The code currently is:
{code:python}
if 'fieldDelimiter' not in src_fmt_configs:
src_fmt_configs['fieldDelimiter'] = field_delimiter
if quote_character:
src_fmt_configs['quote'] = quote_character
if allow_quoted_newlines:
src_fmt_configs['allowQuotedNewlines'] = allow_quoted_newlines
{code}
If my data doesn't have quote characters as per BQ API docs I need to put
`quote=''` i.e empty string. The above condition `if quote_character:` will
return false for an empty string. Hence, I get the following error:
{code:json}
{'message': 'Error detected while parsing row starting at position: 0. Error:
Data between close double quote (") and field separator.', 'reason': 'invalid'}
{code}
So, the condition should be :
{code:python}
if quote_character is not None:
src_fmt_configs['quote'] = quote_character
{code}
> BigQuery Hook bug when data doesn't contain quoted values
> ---------------------------------------------------------
>
> Key: AIRFLOW-2053
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2053
> Project: Apache Airflow
> Issue Type: Bug
> Components: gcp
> Affects Versions: 1.9.0, 1.8.2
> Reporter: Kaxil Naik
> Assignee: Kaxil Naik
> Priority: Minor
> Fix For: Airflow 2.0, 2.0.0
>
>
> The BigQuery API states
> [here|https://cloud.google.com/bigquery/docs/reference/rest/v2/jobs#configuration.load.quote]
> that :
> {quote}The value that is used to quote data sections in a CSV file. BigQuery
> converts the string to ISO-8859-1 encoding, and then uses the first byte of
> the encoded string to split the data in its raw, binary state. The default
> value is a double-quote ('"'). If your data does not contain quoted sections,
> set the property value to an empty string. {quote}
> But the [current
> implementation|https://github.com/apache/incubator-airflow/blob/6ee4bbd4b1bc4b3f275f7946e2bcdd123970e2dd/airflow/contrib/hooks/bigquery_hook.py#L802]
> `run_load ` in BigQuery hook has incorrect check to include
> `quote_character`.
> The code currently is:
> {code:python}
> if 'fieldDelimiter' not in src_fmt_configs:
> src_fmt_configs['fieldDelimiter'] = field_delimiter
> if quote_character:
> src_fmt_configs['quote'] = quote_character
> if allow_quoted_newlines:
> src_fmt_configs['allowQuotedNewlines'] = allow_quoted_newlines
> {code}
> If my data doesn't have quote characters as per BQ API docs I need to put
> `quote=''` i.e empty string. The above condition `if quote_character:` will
> return false for an empty string. Hence, I get the following error:
> {code:json}
> {'message': 'Error detected while parsing row starting at position: 0. Error:
> Data between close double quote (") and field separator.', 'reason':
> 'invalid'}
> {code}
> So, the condition should be :
> {code:python}
> if quote_character is not None:
> src_fmt_configs['quote'] = quote_character
> {code}
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)