Kaxil, in general, I agree with fixing ops in 1.10.*. The main problem
is AIP-21 which together with a lot of changes that happened in the
meantime makes cherrypicking quite troublesome.

T.


On Mon, Apr 20, 2020 at 3:15 PM Kaxil Naik <[email protected]> wrote:
>
> On a side note, regarding "Some operators for Google services that are in
> Airflow 1.10 have bugs that make it difficult or impossible to use them."
>
> which ones are these operators, we should definitely fix any bugs that are
> in Airflow 1.10.10
>
> We should definitely release backport packages but we should also fix the
> bugs for the operators in Airflow 1.10.* code.
>
> Regards,
> Kaxil
>
>
>
> On Mon, Apr 20, 2020 at 2:01 PM Ash Berlin-Taylor <[email protected]> wrote:
>
> > You can prove just about anything with statistics. ;) There may just be
> > two packcages under `http` but it is likely much more frequently used
> > than some of the products in the Google suite.
> >
> > More seriously: I feel very unhappy about the ideas of just releasing
> > the backport of the Google operators without all the rest because of how
> > it looks - Apache projects are meant to be independent of any
> > organisation after all https://www.apache.org/theapacheway/index.html:
> >
> > Two options spring to mind:
> >
> > 1. Google donates some time to write system tests for the rest of the
> > operators.
> > 2. We release them all with just unit tests.
> >
> > After all option 2 is _exactly what we do now for releases_. Either
> > we're happy the unit tests cover things already, or we shouldn't be
> > making any releases. (Of anything, including the main airflow package)
> >
> > On a slightly different subject: have we though how/what we are going to
> > version these packages?
> >
> > Are we going for (say) apache-airflow-provider-google==2.0.0, or perhaps
> > apache-airflow-provider-google==1.99 to mirror what grub2 did for a while.
> >
> > And an important question I think we need to answer before we publish
> > these: What happens to these packages once Airflow 2.0 is out? (Mostly
> > just how to we avoid any installation problems for our users in the
> > future. Whether these packages live on or not can be addressed in AIP-8?)
> >
> > -ash
> >
> >
> > On Apr 20 2020, at 12:29 pm, Jarek Potiuk <[email protected]>
> > wrote:
> >
> > > Great stats Kamil :). I have not realized there is so big imbalance
> > > when it comes to the number of operators :).
> > >
> > > I fully agree 66% +sounds like great value. And the stats tell me that
> > > maybe we are not that far away from testing everything :)
> > >
> > > J.
> > >
> > > On Mon, Apr 20, 2020 at 1:01 PM Kamil Breguła
> > > <[email protected]> wrote:
> > >>
> > >> Hello,
> > >>
> > >> Thanks Jarek, that you deal with this topic. It is very important for
> > >> our users. Many users want to use new operators, but this is not
> > >> possible.
> > >>
> > >> In my opinion, we should not only look at the package name, but their
> > >> content is more important. We should base our decisions on hard data.
> > >> For this reason, I have prepared some statistics.  I counted how many
> > >> operators are in each package.
> > >>
> > >>     298 google
> > >>      49 amazon
> > >>      27 apache
> > >>      13 microsoft
> > >>       6 yandex
> > >>       6 qubole
> > >>       4 mysql
> > >>       3 slack
> > >>       3 redis
> > >>       3 jira
> > >>       3 cncf
> > >>       2 snowflake
> > >>       2 sftp
> > >>       2 salesforce
> > >>       2 oracle
> > >>       2 http
> > >>       2 ftp
> > >>       2 docker
> > >>       2 databricks
> > >>       1 vertica
> > >>       1 ssh
> > >>       1 sqlite
> > >>       1 singularity
> > >>       1 segment
> > >>       1 postgres
> > >>       1 papermill
> > >>       1 opsgenie
> > >>       1 mongo
> > >>       1 jenkins
> > >>       1 jdbc
> > >>       1 imap
> > >>       1 grpc
> > >>       1 exasol
> > >>       1 email
> > >>       1 discord
> > >>       1 dingding
> > >>       1 datadog
> > >>       1 celery
> > >>
> > >> So we have
> > >> 298 operators in google package (66% of total)
> > >> 152 operators in other packages
> > >>
> > >> Here is a list of all operators in Airflow master:
> > https://pastebin.com/GyARtGRC
> > >> To generate statistics I use the following command:
> > >> cat list-all.txt | grep providers | cut -d "." -f 3 | sort  -n | uniq
> > >> -c | sort -n -r
> > >> cat list-all.txt | grep providers | cut -d "." -f 3 | sort  -n | uniq
> > >> -c | sort -n -r | grep google | awk '{sum += $1} END {print sum}'
> > >> cat list-all.txt | grep providers | cut -d "." -f 3 | sort  -n | uniq
> > >> -c | sort -n -r | grep -v google | awk '{sum += $1} END {print sum}'
> > >>
> > >> Now we can ask another question - should we release packages with 66+%
> > >> operators? If not, what percentage will be appropriate?
> > >>
> > >> In my opinion, we should release tested packages as soon as possible.
> > >> This allows users to become better acquainted with this idea, and in
> > >> the long run, encourage more people to test other services as well.
> > >>
> > >> Some operators for Google services that are in Airflow 1.10 have bugs
> > >> that make it difficult or impossible to use them. Many operators have
> > >> also never been released in any Airflow 1.10 release Many users write
> > >> to me who want to use Airflow 2.0 operators and I don't have good news
> > >> for them. If I can't solve all the problems then I would like to be
> > >> able to solve the problem only for a few people, but don't stay in one
> > >> place. Users expect that they will be able to use these operators now,
> > >> so if there are no technical obstacles then we should do it as soon as
> > >> possible.
> > >>
> > >> Best regards,
> > >> Kamil
> > >>
> > >> On Mon, Apr 20, 2020 at 10:06 AM Jarek Potiuk
> > >> <[email protected]> wrote:
> > >> >
> > >> > I would like to focus this week on releasing backport packages. And I
> > >> > would like to ask you for opinions on what should be the first "bunch
> > >> > of packages" to release:
> > >> >
> > >> > The current status snapshot is here:
> > >> >
> > https://cwiki.apache.org/confluence/display/AIRFLOW/Backported+providers+packages+for+Airflow+1.10.*+series
> > >> >
> > >> > We have a project in Github:
> > >> > https://github.com/apache/airflow/projects/2 where I keep the status
> > >> > of the packages and if you drill down to issues you will see that we
> > >> > have very well defined criteria for each of the packages to be
> > >> > "ready-to-release".
> > >> >
> > >> > I think adding system tests and actual testing is a slow process. We
> > >> > completed it for "google" "Postgres" "MySQL" packages and I am
> > >> > planning to complete it for "HTTP" - possibly few simpler ones like
> > >> > "sftp" "ssh" myself this week. We also need to re-test it for 1.10.10
> > >> > but since we have semi-automated system tests, it will be easy and I
> > >> > might even be able to automate it with Github Actions.
> > >> >
> > >> > However, the two important ones "Microsoft" and "Amazon" are still
> > >> > quite far from completion (or even starting for "Microsoft").
> > >> >
> > >> > I might try to engage more people to do the testing, but I think there
> > >> > also might be a value in releasing some first packages so that people
> > >> > start using them and maybe then this will be a bigger incentive to do
> > >> > more testing and implement system tests for other packages.
> > >> >
> > >> > I think about two scenarios of release:
> > >> >
> > >> > 1) Google + postgres + mysql + http + ssh +sftp
> > >> >
> > >> > 2) Same as above but we wait for "amazon" "microsoft" to complete
> > >> >
> > >> > What do you think - should we release the first bunch of operators
> > >> > now? I personally think we should do that.
> > >> >
> > >> > J.
> > >> >
> > >> >
> > >> >
> > >> > --
> > >> > Jarek Potiuk
> > >> > Polidea | Principal Software Engineer
> > >> >
> > >> > M: +48 660 796 129
> > >
> > >
> > >
> > > --
> > >
> > > Jarek Potiuk
> > > Polidea | Principal Software Engineer
> > >
> > > M: +48 660 796 129
> > >
> >



-- 

Tomasz Urbaszek
Polidea | Software Engineer

M: +48 505 628 493
E: [email protected]

Unique Tech
Check out our projects!

Reply via email to