Great stats Kamil :). I have not realized there is so big imbalance
when it comes to the number of operators :).

I fully agree 66% +sounds like great value. And the stats tell me that
maybe we are not that far away from testing everything :)

J.

On Mon, Apr 20, 2020 at 1:01 PM Kamil Breguła <[email protected]> wrote:
>
> Hello,
>
> Thanks Jarek, that you deal with this topic. It is very important for
> our users. Many users want to use new operators, but this is not
> possible.
>
> In my opinion, we should not only look at the package name, but their
> content is more important. We should base our decisions on hard data.
> For this reason, I have prepared some statistics.  I counted how many
> operators are in each package.
>
>     298 google
>      49 amazon
>      27 apache
>      13 microsoft
>       6 yandex
>       6 qubole
>       4 mysql
>       3 slack
>       3 redis
>       3 jira
>       3 cncf
>       2 snowflake
>       2 sftp
>       2 salesforce
>       2 oracle
>       2 http
>       2 ftp
>       2 docker
>       2 databricks
>       1 vertica
>       1 ssh
>       1 sqlite
>       1 singularity
>       1 segment
>       1 postgres
>       1 papermill
>       1 opsgenie
>       1 mongo
>       1 jenkins
>       1 jdbc
>       1 imap
>       1 grpc
>       1 exasol
>       1 email
>       1 discord
>       1 dingding
>       1 datadog
>       1 celery
>
> So we have
> 298 operators in google package (66% of total)
> 152 operators in other packages
>
> Here is a list of all operators in Airflow master: 
> https://pastebin.com/GyARtGRC
> To generate statistics I use the following command:
> cat list-all.txt | grep providers | cut -d "." -f 3 | sort  -n | uniq
> -c | sort -n -r
> cat list-all.txt | grep providers | cut -d "." -f 3 | sort  -n | uniq
> -c | sort -n -r | grep google | awk '{sum += $1} END {print sum}'
> cat list-all.txt | grep providers | cut -d "." -f 3 | sort  -n | uniq
> -c | sort -n -r | grep -v google | awk '{sum += $1} END {print sum}'
>
> Now we can ask another question - should we release packages with 66+%
> operators? If not, what percentage will be appropriate?
>
> In my opinion, we should release tested packages as soon as possible.
> This allows users to become better acquainted with this idea, and in
> the long run, encourage more people to test other services as well.
>
> Some operators for Google services that are in Airflow 1.10 have bugs
> that make it difficult or impossible to use them. Many operators have
> also never been released in any Airflow 1.10 release Many users write
> to me who want to use Airflow 2.0 operators and I don't have good news
> for them. If I can't solve all the problems then I would like to be
> able to solve the problem only for a few people, but don't stay in one
> place. Users expect that they will be able to use these operators now,
> so if there are no technical obstacles then we should do it as soon as
> possible.
>
> Best regards,
> Kamil
>
> On Mon, Apr 20, 2020 at 10:06 AM Jarek Potiuk <[email protected]> 
> wrote:
> >
> > I would like to focus this week on releasing backport packages. And I
> > would like to ask you for opinions on what should be the first "bunch
> > of packages" to release:
> >
> > The current status snapshot is here:
> > https://cwiki.apache.org/confluence/display/AIRFLOW/Backported+providers+packages+for+Airflow+1.10.*+series
> >
> > We have a project in Github:
> > https://github.com/apache/airflow/projects/2 where I keep the status
> > of the packages and if you drill down to issues you will see that we
> > have very well defined criteria for each of the packages to be
> > "ready-to-release".
> >
> > I think adding system tests and actual testing is a slow process. We
> > completed it for "google" "Postgres" "MySQL" packages and I am
> > planning to complete it for "HTTP" - possibly few simpler ones like
> > "sftp" "ssh" myself this week. We also need to re-test it for 1.10.10
> > but since we have semi-automated system tests, it will be easy and I
> > might even be able to automate it with Github Actions.
> >
> > However, the two important ones "Microsoft" and "Amazon" are still
> > quite far from completion (or even starting for "Microsoft").
> >
> > I might try to engage more people to do the testing, but I think there
> > also might be a value in releasing some first packages so that people
> > start using them and maybe then this will be a bigger incentive to do
> > more testing and implement system tests for other packages.
> >
> > I think about two scenarios of release:
> >
> > 1) Google + postgres + mysql + http + ssh +sftp
> >
> > 2) Same as above but we wait for "amazon" "microsoft" to complete
> >
> > What do you think - should we release the first bunch of operators
> > now? I personally think we should do that.
> >
> > J.
> >
> >
> >
> > --
> > Jarek Potiuk
> > Polidea | Principal Software Engineer
> >
> > M: +48 660 796 129



-- 

Jarek Potiuk
Polidea | Principal Software Engineer

M: +48 660 796 129

Reply via email to