Kaxil, in general, I agree with fixing ops in 1.10.*. The main problem is AIP-21 which together with a lot of changes that happened in the meantime makes cherrypicking quite troublesome.
T. On Mon, Apr 20, 2020 at 3:15 PM Kaxil Naik <[email protected]> wrote: > > On a side note, regarding "Some operators for Google services that are in > Airflow 1.10 have bugs that make it difficult or impossible to use them." > > which ones are these operators, we should definitely fix any bugs that are > in Airflow 1.10.10 > > We should definitely release backport packages but we should also fix the > bugs for the operators in Airflow 1.10.* code. > > Regards, > Kaxil > > > > On Mon, Apr 20, 2020 at 2:01 PM Ash Berlin-Taylor <[email protected]> wrote: > > > You can prove just about anything with statistics. ;) There may just be > > two packcages under `http` but it is likely much more frequently used > > than some of the products in the Google suite. > > > > More seriously: I feel very unhappy about the ideas of just releasing > > the backport of the Google operators without all the rest because of how > > it looks - Apache projects are meant to be independent of any > > organisation after all https://www.apache.org/theapacheway/index.html: > > > > Two options spring to mind: > > > > 1. Google donates some time to write system tests for the rest of the > > operators. > > 2. We release them all with just unit tests. > > > > After all option 2 is _exactly what we do now for releases_. Either > > we're happy the unit tests cover things already, or we shouldn't be > > making any releases. (Of anything, including the main airflow package) > > > > On a slightly different subject: have we though how/what we are going to > > version these packages? > > > > Are we going for (say) apache-airflow-provider-google==2.0.0, or perhaps > > apache-airflow-provider-google==1.99 to mirror what grub2 did for a while. > > > > And an important question I think we need to answer before we publish > > these: What happens to these packages once Airflow 2.0 is out? (Mostly > > just how to we avoid any installation problems for our users in the > > future. Whether these packages live on or not can be addressed in AIP-8?) > > > > -ash > > > > > > On Apr 20 2020, at 12:29 pm, Jarek Potiuk <[email protected]> > > wrote: > > > > > Great stats Kamil :). I have not realized there is so big imbalance > > > when it comes to the number of operators :). > > > > > > I fully agree 66% +sounds like great value. And the stats tell me that > > > maybe we are not that far away from testing everything :) > > > > > > J. > > > > > > On Mon, Apr 20, 2020 at 1:01 PM Kamil Breguła > > > <[email protected]> wrote: > > >> > > >> Hello, > > >> > > >> Thanks Jarek, that you deal with this topic. It is very important for > > >> our users. Many users want to use new operators, but this is not > > >> possible. > > >> > > >> In my opinion, we should not only look at the package name, but their > > >> content is more important. We should base our decisions on hard data. > > >> For this reason, I have prepared some statistics. I counted how many > > >> operators are in each package. > > >> > > >> 298 google > > >> 49 amazon > > >> 27 apache > > >> 13 microsoft > > >> 6 yandex > > >> 6 qubole > > >> 4 mysql > > >> 3 slack > > >> 3 redis > > >> 3 jira > > >> 3 cncf > > >> 2 snowflake > > >> 2 sftp > > >> 2 salesforce > > >> 2 oracle > > >> 2 http > > >> 2 ftp > > >> 2 docker > > >> 2 databricks > > >> 1 vertica > > >> 1 ssh > > >> 1 sqlite > > >> 1 singularity > > >> 1 segment > > >> 1 postgres > > >> 1 papermill > > >> 1 opsgenie > > >> 1 mongo > > >> 1 jenkins > > >> 1 jdbc > > >> 1 imap > > >> 1 grpc > > >> 1 exasol > > >> 1 email > > >> 1 discord > > >> 1 dingding > > >> 1 datadog > > >> 1 celery > > >> > > >> So we have > > >> 298 operators in google package (66% of total) > > >> 152 operators in other packages > > >> > > >> Here is a list of all operators in Airflow master: > > https://pastebin.com/GyARtGRC > > >> To generate statistics I use the following command: > > >> cat list-all.txt | grep providers | cut -d "." -f 3 | sort -n | uniq > > >> -c | sort -n -r > > >> cat list-all.txt | grep providers | cut -d "." -f 3 | sort -n | uniq > > >> -c | sort -n -r | grep google | awk '{sum += $1} END {print sum}' > > >> cat list-all.txt | grep providers | cut -d "." -f 3 | sort -n | uniq > > >> -c | sort -n -r | grep -v google | awk '{sum += $1} END {print sum}' > > >> > > >> Now we can ask another question - should we release packages with 66+% > > >> operators? If not, what percentage will be appropriate? > > >> > > >> In my opinion, we should release tested packages as soon as possible. > > >> This allows users to become better acquainted with this idea, and in > > >> the long run, encourage more people to test other services as well. > > >> > > >> Some operators for Google services that are in Airflow 1.10 have bugs > > >> that make it difficult or impossible to use them. Many operators have > > >> also never been released in any Airflow 1.10 release Many users write > > >> to me who want to use Airflow 2.0 operators and I don't have good news > > >> for them. If I can't solve all the problems then I would like to be > > >> able to solve the problem only for a few people, but don't stay in one > > >> place. Users expect that they will be able to use these operators now, > > >> so if there are no technical obstacles then we should do it as soon as > > >> possible. > > >> > > >> Best regards, > > >> Kamil > > >> > > >> On Mon, Apr 20, 2020 at 10:06 AM Jarek Potiuk > > >> <[email protected]> wrote: > > >> > > > >> > I would like to focus this week on releasing backport packages. And I > > >> > would like to ask you for opinions on what should be the first "bunch > > >> > of packages" to release: > > >> > > > >> > The current status snapshot is here: > > >> > > > https://cwiki.apache.org/confluence/display/AIRFLOW/Backported+providers+packages+for+Airflow+1.10.*+series > > >> > > > >> > We have a project in Github: > > >> > https://github.com/apache/airflow/projects/2 where I keep the status > > >> > of the packages and if you drill down to issues you will see that we > > >> > have very well defined criteria for each of the packages to be > > >> > "ready-to-release". > > >> > > > >> > I think adding system tests and actual testing is a slow process. We > > >> > completed it for "google" "Postgres" "MySQL" packages and I am > > >> > planning to complete it for "HTTP" - possibly few simpler ones like > > >> > "sftp" "ssh" myself this week. We also need to re-test it for 1.10.10 > > >> > but since we have semi-automated system tests, it will be easy and I > > >> > might even be able to automate it with Github Actions. > > >> > > > >> > However, the two important ones "Microsoft" and "Amazon" are still > > >> > quite far from completion (or even starting for "Microsoft"). > > >> > > > >> > I might try to engage more people to do the testing, but I think there > > >> > also might be a value in releasing some first packages so that people > > >> > start using them and maybe then this will be a bigger incentive to do > > >> > more testing and implement system tests for other packages. > > >> > > > >> > I think about two scenarios of release: > > >> > > > >> > 1) Google + postgres + mysql + http + ssh +sftp > > >> > > > >> > 2) Same as above but we wait for "amazon" "microsoft" to complete > > >> > > > >> > What do you think - should we release the first bunch of operators > > >> > now? I personally think we should do that. > > >> > > > >> > J. > > >> > > > >> > > > >> > > > >> > -- > > >> > Jarek Potiuk > > >> > Polidea | Principal Software Engineer > > >> > > > >> > M: +48 660 796 129 > > > > > > > > > > > > -- > > > > > > Jarek Potiuk > > > Polidea | Principal Software Engineer > > > > > > M: +48 660 796 129 > > > > > -- Tomasz Urbaszek Polidea | Software Engineer M: +48 505 628 493 E: [email protected] Unique Tech Check out our projects!
