Hi all,

I  was talking about a dev project I was working on and there's some
progress:

https://github.com/gtoonstra/airflow-hovercraft


There are two types of tests:

1. behavior tests:  These test the behavior of operators against a stubbed
out "hook", which is driven through python "behave" scripts. The behave
script reads a bit easier, shows the data and is used to test the behavior
of operators against stubbed out versions of hooks.

2. hook tests: these draw in docker containers with popular databases and
then test the hook methods against them. There's a provisioning system for
these containers using yaml files, where you can tweak, reconfigure and
load the containers with other data if that is required. This part is still
very immature, but I hope it shows the potential.  There are currently
single method tests for hive, s3, samba, ftps, mssql, mysql and postgres.


The docker containers are managed through "hovertools", which I split out,
because it can be reused when testing dags end-to-end. For a dag, you can
use the tools to reset containers prior to triggering the DAG.


I found 3 issues in airflow hooks along the way:
- FTPSHook doesn't get initialized correctly (could be related to python3),
the socket doesn't get wrapped and the connection fails. At least in my
testing.
- The samba hook uses an implementation of smb that's not been updated
since 2012 and fails on a string/bytes encoding issue.
- The s3 hook couldn't connect to a custom port to faciliate this testing.


Best regards,

Gerard

Reply via email to