dlamblin commented on a change in pull request #11113:
URL: https://github.com/apache/airflow/pull/11113#discussion_r494159142



##########
File path: airflow/providers/google/cloud/operators/bigquery.py
##########
@@ -20,33 +20,52 @@
 """
 This module contains Google BigQuery operators.
 """
+import datetime
 import enum
 import hashlib
 import json
+import logging
+import random
 import re
+import string
 import uuid
 import warnings
 from typing import Any, Dict, Iterable, List, Optional, Sequence, Set, 
SupportsAbs, Union
+from urllib.parse import urlsplit
 
 import attr
 from google.api_core.exceptions import Conflict
 from google.cloud.bigquery import TableReference
 
+from great_expectations.data_context.types.base import DataContextConfig

Review comment:
       The dependency will need to get added as an extra to setup.py

##########
File path: airflow/providers/google/cloud/operators/bigquery.py
##########
@@ -2109,3 +2128,265 @@ def execute(self, context: Any):
     def on_kill(self):
         if self.job_id and self.cancel_on_kill:
             self.hook.cancel_job(job_id=self.job_id, 
project_id=self.project_id, location=self.location)
+
+
+class GreatExpectationsBigQueryOperator(BaseOperator):

Review comment:
       I'm not much of a committer so take this with a grain of salt as a user 
suggestion, but:
   I'd like to see this broken into two classes, 
`GreatExpectationsBaseOperator`, and `GreatExpectationsBigQueryOperator`.
   
   The former could have a method `_configure_project(…)` composed of calls to 
`_configure_datasouces(…)` `_configure_stores(…)` and `_configure_docs(…)` 
which would effectively throw a not implemented exception. Its init would not 
mention the gcp specific params but still cover the validations, email, and 
other parameters that are general. The latter could then implement these 
methods to build the config used in the base's execute method. Its init could 
call super with kwargs. This would leave it open to users if they wanted to run 
expectations on one of the other supported datasources to implement something 
that could.
   
   I'm a little unsure of how you could support EG a user that wants to DQ 
check a mysql db and have the stores and docs in gcp vs checking a mysql db 
with the stores and docs on aws. I almost want to recommend mixins, but am 
unsure.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to