dlamblin commented on a change in pull request #11113:
URL: https://github.com/apache/airflow/pull/11113#discussion_r494159142
##########
File path: airflow/providers/google/cloud/operators/bigquery.py
##########
@@ -20,33 +20,52 @@
"""
This module contains Google BigQuery operators.
"""
+import datetime
import enum
import hashlib
import json
+import logging
+import random
import re
+import string
import uuid
import warnings
from typing import Any, Dict, Iterable, List, Optional, Sequence, Set,
SupportsAbs, Union
+from urllib.parse import urlsplit
import attr
from google.api_core.exceptions import Conflict
from google.cloud.bigquery import TableReference
+from great_expectations.data_context.types.base import DataContextConfig
Review comment:
The dependency will need to get added as an extra to setup.py
##########
File path: airflow/providers/google/cloud/operators/bigquery.py
##########
@@ -2109,3 +2128,265 @@ def execute(self, context: Any):
def on_kill(self):
if self.job_id and self.cancel_on_kill:
self.hook.cancel_job(job_id=self.job_id,
project_id=self.project_id, location=self.location)
+
+
+class GreatExpectationsBigQueryOperator(BaseOperator):
Review comment:
I'm not much of a committer so take this with a grain of salt as a user
suggestion, but:
I'd like to see this broken into two classes,
`GreatExpectationsBaseOperator`, and `GreatExpectationsBigQueryOperator`.
The former could have a method `_configure_project(…)` composed of calls to
`_configure_datasouces(…)` `_configure_stores(…)` and `_configure_docs(…)`
which would effectively throw a not implemented exception. Its init would not
mention the gcp specific params but still cover the validations, email, and
other parameters that are general. The latter could then implement these
methods to build the config used in the base's execute method. Its init could
call super with kwargs. This would leave it open to users if they wanted to run
expectations on one of the other supported datasources to implement something
that could.
I'm a little unsure of how you could support EG a user that wants to DQ
check a mysql db and have the stores and docs in gcp vs checking a mysql db
with the stores and docs on aws. I almost want to recommend mixins, but am
unsure.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]