Re: [DISCUSS] AIP-4 System tests

Jarek Potiuk Wed, 04 Mar 2020 05:25:25 -0800

Hello One small update.

We are trying now with Bjorn Olsen to see how well the System Tests
approach we did for Google Cloud Platform can be applied to AWS. This
might be a good exercise to see if we can apply it to other services and
make it part of releasing backport operators, fully automating it with
AIP-4  and later it can be a good start for AIP-8 (separate providers).


I created a #system-tests channel in Slack so - anyone interested in the
subject is welcome. Also if anyone would like to implement and test system
tests for any of the providers. You are welcome to join!. This is a
current list of providers we have. Some of them are super simple. Some more
complex. With "google" we are going to address by far the biggest one :).


{
        "amazon": [setup.aws],
        "apache.cassandra": [setup.cassandra],
        "apache.druid": [setup.druid],
        "apache.hdfs": [setup.hdfs],
        "apache.hive": [setup.hive],
        "apache.pig": [],
        "apache.pinot": [setup.pinot],
        "apache.spark": [],
        "apache.sqoop": [],
        "celery": [setup.celery],
        "cloudant": [setup.cloudant],
        "cncf.kubernetes": [setup.kubernetes],
        "databricks": [setup.databricks],
        "datadog": [setup.datadog],
        "dingding": [],
        "discord": [],
        "docker": [setup.docker],
        "email": [],
        "ftp": [],
        "google.cloud": [setup.gcp],
        "google.marketing_platform": [setup.gcp],
        "google.suite": [setup.gcp],
        "grpc": [setup.grpc],
        "http": [],
        "imap": [],
        "jdbc": [setup.jdbc],
        "jenkins": [setup.jenkins],
        "jira": [setup.jira],
        "microsoft.azure": [setup.azure],
        "microsoft.mssql": [setup.mssql],
        "microsoft.winrm": [setup.winrm],
        "mongo": [setup.mongo],
        "mysql": [setup.mysql],
        "odbc": [setup.odbc],
        "openfass": [],
        "opsgenie": [],
        "oracle": [setup.oracle],
        "pagerduty": [setup.pagerduty],
        "papermill": [setup.papermill],
        "postgres": [setup.postgres],
        "presto": [setup.presto],
        "qubole": [setup.qds],
        "redis": [setup.redis],
        "salesforce": [setup.salesforce],
        "samba": [setup.samba],
        "segment": [setup.segment],
        "sftp": [setup.ssh],
        "slack": [setup.slack],
        "snowflake": [setup.snowflake],
        "sqlite": [],
        "ssh": [setup.ssh],
        "vertica": [setup.vertica],
        "zendesk": [setup.zendesk],
}




J.


On Fri, Feb 21, 2020 at 2:50 PM Jarek Potiuk <jarek.pot...@polidea.com>
wrote:

> Any more comments for system tests? I would love to vote on the AIP-4 and
> my current proposal would be :
>
> 1) Let's try to automate system test execution (starting with GCP as it is
> close to be ready). That would most likely be with Github Actions -
> details to be worked on.
> 2) We can do it to automate testing of Backport operators (which
> complete AIP-21)
> 3) We can build it in the way that other provider's tests can be executed
> automatically as well, providing that there is a contribution with system
> tests.
>
> WDYT ?
>
>
> J.
>
>
> On Sat, Feb 15, 2020 at 8:59 PM Jarek Potiuk <jarek.pot...@polidea.com>
> wrote:
>
>>
>>
>> On Sat, Feb 15, 2020, 20:54 Ash Berlin-Taylor <a...@apache.org> wrote:
>>
>>> Yeah, I'm for this.
>>>
>>> In fact I'm about to mark some of the Hive ones as system tests as they
>>> require a running hive cluster.
>>> I would be careful about which automatically marking unit tests that run
>>> dags as system/integration though, a number of our unit tests rely on this
>>> to test the tasks in various states in the parts of the scheduler. Ideally
>>> they wouldn't, but right now they do, and the tasks they run are of the
>>> DummyOperator or "bash_command=date" flavour.
>>>
>>
>> Agree with Ash here, I think for this we should have yet another category
>> 'dag tests' 'core tests' ? - those are indeed run using dags run with the
>> whole Airflow underneath but their purpose I to yes Airvlow Cor not
>> external systems
>>
>>
>> j.
>>
>>
>> -ash
>>> On Feb 15 2020, at 7:28 pm, Tomasz Urbaszek <turbas...@apache.org>
>>> wrote:
>>> > +1 for introducing system tests. Lack of them is a big pain.
>>> >
>>> > I would like also to suggest to mark some actual tests (those running
>>> > DAGs, etc) as system tests. Then we can simplify our units and
>>> > probably speed up CI builds (not to mention the reduction of side
>>> > effects). The approach used for GCP system tests that runs an example
>>> > DAG makes creating such tests really easy (or we can generate them
>>> > automatically...).I
>>
>> >
>>> > Regarding the frequency of such tests, I think we should run all of
>>> > them daily on master. Or run them when there is a change in specific
>>> > files (operators / hooks etc).
>>> >
>>> > Tomek
>>> >
>>> > On Sat, Feb 15, 2020 at 1:15 PM Jarek Potiuk <jarek.pot...@polidea.com>
>>> wrote:
>>> > >
>>> > > TL;DR; I would like to revive a discussion (hopefully short :) and
>>> possibly
>>> > > cast a vote on "AIP-4 - Support for System Tests for external
>>> systems".
>>> > >
>>> > >
>>> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-4+Support+for+System+Tests+for+external+systems
>>> > > This is the very first AIP I created almost 1.5 years ago and it
>>> took very
>>> > > long to get to the point where I think we are very, very close to
>>> being
>>> > > able to implement it after many, many baby steps (and some bigger
>>> leaps)
>>> > > that we've done in the meantime.
>>> > >
>>> > > *Let me just quickly summarise what is the context:*
>>> > > - One of the biggest Airflow advantages are integrations with
>>> external
>>> > > systems. We have i think several 100s of hooks and operators working
>>> with
>>> > > those external systems
>>> > > - We have an extensive set of tests - both unit and integrations that
>>> > > are sometimes really good and catching a lot of problems, but they
>>> can only
>>> > > do as much as mocking out access to the external systems.
>>> > > Unit/integration tests are great for testing the core of Airflow and
>>> it's
>>> > > functionality but the external services cannot be effectively tested
>>> > > - The externa services sometimes change - we have new versions of
>>> tools,
>>> > > services etc released every day and sometimes even if we perfectly
>>> mock it
>>> > > in unit tests - the hooks simply stop working at some point in time.
>>> > > - I think there is a need to run some tests on a systems level
>>> regularly
>>> > > - communicating with "real" external systems and testing our
>>> operators,
>>> > > Let's call them System Tests. They do not necessarily need to be run
>>> with
>>> > > every PR, but I think running them regularly makes perfect sense.
>>> > >
>>> > >
>>> > > *Why now? Why this seems to be a good time to do it?*
>>> > > - We switched to pytests and we already have separation to
>>> > > unit/integration tests in place - we can add support to system tests
>>> using
>>> > > the same mechanisms.
>>> > > - With AIP-21 we grouped the tests into "providers" package and that
>>> > > makes it easy to define boundaries of "systems" - every provider is a
>>> > > "system" to test.
>>> > > - We have plenty of system tests implemented for GCP which we are
>>> going
>>> > > to use to run tests for backported packages from AIP-21 - we followed
>>> > > system test automation for more than a year in GCP operators and we
>>> have it
>>> > > fully automated already.
>>> > > - In the latest PR - https://github.com/apache/airflow/pull/7389 we
>>> even
>>> > > extracted all the GCP-specific way we run system tests in the way to
>>> a)
>>> > > make it easy for everyone to write automated system tests b) make it
>>> > > possible to be automated.
>>> > > - We have credits provided by Google to run our tests and we can use
>>> > > them for regular runs of the system tests
>>> > > - We are close to switch-over to GitHub Actions, which will make it
>>> easy
>>> > > to write manually or regularly scheduled actions that will have
>>> securely
>>> > > stored credentials to run the system tests - in a way that it will be
>>> > > controlled by committers and not abusable by contributors who
>>> prepare PRs.
>>> > > - I would like to start and lead a community-driven effort where we
>>> will
>>> > > split amongst community members writing missing tests - so that our
>>> new
>>> > > backport packages can be tested against latest-released version of
>>> 1.10.*.
>>> > > We will provide GCP tests as examples, we will also setup the
>>> automation
>>> > > needed to run the tests regularly - the only thing we will ask the
>>> members
>>> > > of the community is to write missing tests. This way I hope we can
>>> get very
>>> > > high coverage of backported packages.
>>> > >
>>> > > There are of course still a number of open questions - like how to
>>> store
>>> > > credentials, how often to run the tests etc. but I think those are
>>> > > implementation details that we can work out while we are
>>> implementing it.
>>> > >
>>> > > What do you think about it? If I have a lot of "yes's" quickly, I
>>> would
>>> > > love to start voting on AIP-4.
>>> > >
>>> > > J.
>>> > >
>>> > >
>>> > >
>>> > >
>>> > >
>>> > >
>>> > > --
>>> > > Jarek Potiuk
>>> > > Polidea <https://www.polidea.com/> | Principal Software Engineer
>>> > >
>>> > > M: +48 660 796 129 <+48660796129>
>>> > > [image: Polidea] <https://www.polidea.com/>
>>> >
>>> >
>>>
>>>
>
> --
>
> Jarek Potiuk
> Polidea <https://www.polidea.com/> | Principal Software Engineer
>
> M: +48 660 796 129 <+48660796129>
> [image: Polidea] <https://www.polidea.com/>
>
>

-- 

Jarek Potiuk
Polidea <https://www.polidea.com/> | Principal Software Engineer

M: +48 660 796 129 <+48660796129>
[image: Polidea] <https://www.polidea.com/>

Re: [DISCUSS] AIP-4 System tests

Reply via email to