Hello,

Thanks Jarek, that you deal with this topic. It is very important for
our users. Many users want to use new operators, but this is not
possible.

In my opinion, we should not only look at the package name, but their
content is more important. We should base our decisions on hard data.
For this reason, I have prepared some statistics.  I counted how many
operators are in each package.

    298 google
     49 amazon
     27 apache
     13 microsoft
      6 yandex
      6 qubole
      4 mysql
      3 slack
      3 redis
      3 jira
      3 cncf
      2 snowflake
      2 sftp
      2 salesforce
      2 oracle
      2 http
      2 ftp
      2 docker
      2 databricks
      1 vertica
      1 ssh
      1 sqlite
      1 singularity
      1 segment
      1 postgres
      1 papermill
      1 opsgenie
      1 mongo
      1 jenkins
      1 jdbc
      1 imap
      1 grpc
      1 exasol
      1 email
      1 discord
      1 dingding
      1 datadog
      1 celery

So we have
298 operators in google package (66% of total)
152 operators in other packages

Here is a list of all operators in Airflow master: https://pastebin.com/GyARtGRC
To generate statistics I use the following command:
cat list-all.txt | grep providers | cut -d "." -f 3 | sort  -n | uniq
-c | sort -n -r
cat list-all.txt | grep providers | cut -d "." -f 3 | sort  -n | uniq
-c | sort -n -r | grep google | awk '{sum += $1} END {print sum}'
cat list-all.txt | grep providers | cut -d "." -f 3 | sort  -n | uniq
-c | sort -n -r | grep -v google | awk '{sum += $1} END {print sum}'

Now we can ask another question - should we release packages with 66+%
operators? If not, what percentage will be appropriate?

In my opinion, we should release tested packages as soon as possible.
This allows users to become better acquainted with this idea, and in
the long run, encourage more people to test other services as well.

Some operators for Google services that are in Airflow 1.10 have bugs
that make it difficult or impossible to use them. Many operators have
also never been released in any Airflow 1.10 release Many users write
to me who want to use Airflow 2.0 operators and I don't have good news
for them. If I can't solve all the problems then I would like to be
able to solve the problem only for a few people, but don't stay in one
place. Users expect that they will be able to use these operators now,
so if there are no technical obstacles then we should do it as soon as
possible.

Best regards,
Kamil

On Mon, Apr 20, 2020 at 10:06 AM Jarek Potiuk <[email protected]> wrote:
>
> I would like to focus this week on releasing backport packages. And I
> would like to ask you for opinions on what should be the first "bunch
> of packages" to release:
>
> The current status snapshot is here:
> https://cwiki.apache.org/confluence/display/AIRFLOW/Backported+providers+packages+for+Airflow+1.10.*+series
>
> We have a project in Github:
> https://github.com/apache/airflow/projects/2 where I keep the status
> of the packages and if you drill down to issues you will see that we
> have very well defined criteria for each of the packages to be
> "ready-to-release".
>
> I think adding system tests and actual testing is a slow process. We
> completed it for "google" "Postgres" "MySQL" packages and I am
> planning to complete it for "HTTP" - possibly few simpler ones like
> "sftp" "ssh" myself this week. We also need to re-test it for 1.10.10
> but since we have semi-automated system tests, it will be easy and I
> might even be able to automate it with Github Actions.
>
> However, the two important ones "Microsoft" and "Amazon" are still
> quite far from completion (or even starting for "Microsoft").
>
> I might try to engage more people to do the testing, but I think there
> also might be a value in releasing some first packages so that people
> start using them and maybe then this will be a bigger incentive to do
> more testing and implement system tests for other packages.
>
> I think about two scenarios of release:
>
> 1) Google + postgres + mysql + http + ssh +sftp
>
> 2) Same as above but we wait for "amazon" "microsoft" to complete
>
> What do you think - should we release the first bunch of operators
> now? I personally think we should do that.
>
> J.
>
>
>
> --
> Jarek Potiuk
> Polidea | Principal Software Engineer
>
> M: +48 660 796 129

Reply via email to