[ 
https://issues.apache.org/jira/browse/AIRFLOW-3316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16732845#comment-16732845
 ] 

ASF GitHub Bot commented on AIRFLOW-3316:
-----------------------------------------

conradlee commented on pull request #4430: AIRFLOW-3316 be sure to initialize 
schema_fields variable
URL: https://github.com/apache/incubator-airflow/pull/4430
 
 
   Make sure you have checked _all_ steps below.
   
   ### Jira
   
   - [X] My PR addresses the following [Airflow 
Jira](https://issues.apache.org/jira/browse/AIRFLOW-3316/) issues and 
references them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR"
     - https://issues.apache.org/jira/browse/AIRFLOW-XXX
     - In case you are fixing a typo in the documentation you can prepend your 
commit with \[AIRFLOW-XXX\], code changes always need a Jira issue.
   
   ### Description
   
   - [X] Here are some details about my PR, including screenshots of any UI 
changes:
   The execute method of the GoogleCloudStorageToBigQueryOperator will in 
certain situations fail to initialize a varaible called `schema_fields`, which 
it later references.  See the Jira issue for more details.
   
   ### Tests
   
   - [X] My PR adds the following unit tests __OR__ does not need testing for 
this extremely good reason:
   It's a simple one-line fix which is very easy to reason about.
   
   ### Commits
   
   - [X] My commits all reference Jira issues in their subject lines, and I 
have squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
     1. Subject is separated from body by a blank line
     1. Subject is limited to 50 characters (not including Jira issue reference)
     1. Subject does not end with a period
     1. Subject uses the imperative mood ("add", not "adding")
     1. Body wraps at 72 characters
     1. Body explains "what" and "why", not "how"
   
   ### Documentation
   
   - [X] In case of new functionality, my PR adds documentation that describes 
how to use it.
     - When adding new operators/hooks/sensors, the autoclass documentation 
generation needs to be added.
     - All the public functions and the classes in the PR contain docstrings 
that explain what it does
   
   ### Code Quality
   
   - [X] Passes `flake8`
   
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


> GCS to BQ operator leaves schema_fields operator unset when autodetect=True
> ---------------------------------------------------------------------------
>
>                 Key: AIRFLOW-3316
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-3316
>             Project: Apache Airflow
>          Issue Type: Bug
>          Components: operators
>    Affects Versions: 1.10.1
>            Reporter: Conrad Lee
>            Assignee: Conrad Lee
>            Priority: Minor
>
> When I use the GoogleCloudStorageToBigQueryOperator to load data from Parquet 
> into BigQuery, I leave the schema_fields argument set to 'None' and set 
> autodetect=True.
>  
> This causes the following error: 
>  
> {code:java}
> [2018-11-08 09:42:03,690] {models.py:1736} ERROR - local variable 
> 'schema_fields' referenced before assignment
> Traceback (most recent call last)
>   File "/usr/local/lib/airflow/airflow/models.py", line 1633, in _run_raw_tas
>     result = task_copy.execute(context=context
>   File "/home/airflow/gcs/plugins/bq_operator_updated.py", line 2018, in 
> execut
>     schema_fields=schema_fields
> UnboundLocalError: local variable 'schema_fields' referenced before assignmen
> {code}
>  
> The problem is this set of checks in which the schema_fields variable is set 
> neglects to cover all the cases
> {code:java}
> if not self.schema_fields:
>   if self.schema_object and self.source_format != 'DATASTORE_BACKUP':
>     gcs_hook = GoogleCloudStorageHook(
>         google_cloud_storage_conn_id=self.google_cloud_storage_conn_id, 
>         delegate_to=self.delegate_to)
>     schema_fields = json.loads(gcs_hook.download(
>       self.bucket,
>       self.schema_object).decode("utf-8"))
>   elif self.schema_object is None and self.autodetect is False:
>     raise ValueError('At least one of `schema_fields`, `schema_object`, '
>     'or `autodetect` must be passed.')
> else:
>     schema_fields = self.schema_fields
> {code}
> After the `elif` we need to handle the case where autodetect is set to True.  
> This can be done by simply adding two lines:
> {code:java}
> if not self.schema_fields:
>   if self.schema_object and self.source_format != 'DATASTORE_BACKUP':
>     gcs_hook = GoogleCloudStorageHook(
>         google_cloud_storage_conn_id=self.google_cloud_storage_conn_id, 
>         delegate_to=self.delegate_to)
>     schema_fields = json.loads(gcs_hook.download(
>       self.bucket,
>       self.schema_object).decode("utf-8"))
>   elif self.schema_object is None and self.autodetect is False:
>     raise ValueError('At least one of `schema_fields`, `schema_object`, '
>     'or `autodetect` must be passed.')
>   else:
>     schema_fiels = None
> else:
>     schema_fields = self.schema_fields{code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to