joshua-stauffer opened a new pull request, #58273:
URL: https://github.com/apache/airflow/pull/58273
<!--
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
-->
<!--
Thank you for contributing! Please make sure that your code changes
are covered with tests. And in case of new features or big changes
remember to adjust the documentation.
Feel free to ping committers for the review!
In case of an existing issue, reference it using one of the following:
closes: #ISSUE
related: #ISSUE
How to write a good git commit message:
http://chris.beams.io/posts/git-commit/
-->
<!-- Please keep an empty line above the dashes. -->
---
This pull request introduces a new provider package,
apache-airflow-providers-greatexpectations, which integrates Apache Airflow
with the open-source Great Expectations library for data quality validation.
## Background / Motivation
Great Expectations (GX Core) is an Apache-licensed Python framework for
validating, documenting, and profiling data.
This provider builds on existing work from Astronomer’s
[airflow-provider-great-expectations](https://github.com/astronomer/airflow-provider-great-expectations)
## What this PR includes
- A new provider package following Airflow’s community provider conventions:
- Operators for running GX Core validations from Airflow DAGs
- Example DAGs demonstrating typical data-validation workflows
- Documentation (README, usage guide, API docstrings)
- Comprehensive test suite including:
- Open-source GX Core tests that run entirely offline against public
sample data
- Optional GX Cloud integration tests that validate the same operators
using the GX Cloud API
- Compatibility metadata and constraints:
- Requires GX Core ≥ 1.7
- Compatible with Airflow ≥ 2.10 (including 3.0+)
- Python ≥ 3.10
- Conforms to ASF licensing and follows the [Providers
Lifecycle](https://github.com/apache/airflow/blob/main/providers/MANAGING_PROVIDERS_LIFECYCLE.rst?utm_source=chatgpt.com)
## CI / Testing Details
The base provider tests run without external dependencies.
However, a small subset of integration tests exercise the optional GX Cloud
functionality.
To enable these in the Airflow CI environment, three secrets must be
configured:
```
GX_CLOUD_ACCESS_TOKEN
GX_CLOUD_ORGANIZATION_ID
GX_CLOUD_WORKSPACE_ID
```
These values are used exclusively for automated tests against a dedicated GX
Cloud workspace.
If these secrets are not present, the tests are automatically skipped, and
the rest of the suite will pass normally.
This approach ensures CI reproducibility while validating both the
open-source and managed GX Cloud use cases.
## Maintenance & Commitment
The Great Expectations and Astronomer teams jointly commit to:
- Ongoing maintenance of the provider (API updates, compatibility fixes, doc
improvements)
- Responding to community issues and PRs related to the provider
- Keeping tests current with future GX Core and Airflow releases
## Next Steps / Request for Review
We invite the Airflow community and committers to review this contribution
and provide feedback on:
- Provider structure and naming
- Operator and hook interfaces
- Dependency and packaging strategy
- CI integration and test design
- Documentation and examples
Once community feedback is addressed, we propose merging this under
providers/great_expectations/ and proceeding with the first release as
apache-airflow-providers-greatexpectations v1.0.0.
Thank you for your review and support!
We look forward to collaborating with the Airflow community to make this
integration broadly useful.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]