IKholopov commented on code in PR #28284:
URL: https://github.com/apache/airflow/pull/28284#discussion_r1048935913


##########
airflow/providers/google/cloud/transfers/gcs_to_bigquery.py:
##########
@@ -544,43 +494,272 @@ def _check_schema_fields(self, table_resource):
         :param table_resource: Configuration or table_resource dictionary
         :return: table_resource: Updated table_resource dict with schema_fields
         """
-        if not self.autodetect and not self.schema_fields:
-            raise RuntimeError(
-                "Table schema was not found. Set autodetect=True to "
-                "automatically set schema fields from source objects or pass "
-                "schema_fields explicitly"
-            )
-        elif not self.schema_fields:
+        if not self.schema_fields:

Review Comment:
   This is still broken in multiple places. 
   
   For example, it makes the assumption about the delimiter being "," instead 
of the value of `field_delimiter` parameter, same for `quote_character` 
parameter.
   
   Overall, I don't think that attempt to replicate the logic of determining 
schema instead of just letting "autodetect" and BigQuery do their thing is a 
wise idea - there are numerous edge cases that we would need to cover and we 
will continue getting issues because of implementations getting out of sync.
   
   I believe that the good approach would be to remove the explicit setting of 
`"schema"` in configuration if `autodetect==True` or remove 
`_check_schema_fields()` altogether.



##########
tests/system/providers/google/cloud/gcs/example_gcs_to_bigquery_async.py:
##########
@@ -81,6 +81,20 @@
         max_id_key=MAX_ID_DATE,
         deferrable=True,
     )
+

Review Comment:
   Can we add a test with non-default values of `field_delimiter`, 
`quote_character` parameters?



##########
airflow/providers/google/cloud/transfers/gcs_to_bigquery.py:
##########
@@ -322,101 +345,29 @@ def execute(self, context: Context):
             if self.schema_object and self.source_format != "DATASTORE_BACKUP":
                 schema_fields = json.loads(gcs_hook.download(self.bucket, 
self.schema_object).decode("utf-8"))
                 self.log.info("Autodetected fields from schema object: %s", 
schema_fields)

Review Comment:
   This logging statement is not correct. Nothing got *autodetected* here, we 
just loaded the explicitly specified schema.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to