> > Kaxil, in general, I agree with fixing ops in 1.10.*. The main problem > is AIP-21 which together with a lot of changes that happened in the > meantime makes cherrypicking quite troublesome.
Who knows this better than me đ . Definitely agree with cherry-picking being very troublesome. I was just referring to BUGs though, that we should fix them and that should not be the reason for releasing Backport Packages. We should release Backport packages because the difference in number of operators we have in Airflow Master vs 1.10.* is growing and it is definitely irritating for users and contributors of that operators/hook to not see their packages making it to the release even though they are in the Master. I agree with Ash that let's not make "system-tests" hard criteria to release backport packages. It is ideal if we have them, but even if we don't we should still release them. CALVER makes sense for versioning. On Mon, Apr 20, 2020 at 3:12 PM Tomasz Urbaszek <[email protected]> wrote: > Kaxil, in general, I agree with fixing ops in 1.10.*. The main problem > is AIP-21 which together with a lot of changes that happened in the > meantime makes cherrypicking quite troublesome. > > T. > > > On Mon, Apr 20, 2020 at 3:15 PM Kaxil Naik <[email protected]> wrote: > > > > On a side note, regarding "Some operators for Google services that are in > > Airflow 1.10 have bugs that make it difficult or impossible to use them." > > > > which ones are these operators, we should definitely fix any bugs that > are > > in Airflow 1.10.10 > > > > We should definitely release backport packages but we should also fix the > > bugs for the operators in Airflow 1.10.* code. > > > > Regards, > > Kaxil > > > > > > > > On Mon, Apr 20, 2020 at 2:01 PM Ash Berlin-Taylor <[email protected]> > wrote: > > > > > You can prove just about anything with statistics. ;) There may just be > > > two packcages under `http` but it is likely much more frequently used > > > than some of the products in the Google suite. > > > > > > More seriously: I feel very unhappy about the ideas of just releasing > > > the backport of the Google operators without all the rest because of > how > > > it looks - Apache projects are meant to be independent of any > > > organisation after all https://www.apache.org/theapacheway/index.html: > > > > > > Two options spring to mind: > > > > > > 1. Google donates some time to write system tests for the rest of the > > > operators. > > > 2. We release them all with just unit tests. > > > > > > After all option 2 is _exactly what we do now for releases_. Either > > > we're happy the unit tests cover things already, or we shouldn't be > > > making any releases. (Of anything, including the main airflow package) > > > > > > On a slightly different subject: have we though how/what we are going > to > > > version these packages? > > > > > > Are we going for (say) apache-airflow-provider-google==2.0.0, or > perhaps > > > apache-airflow-provider-google==1.99 to mirror what grub2 did for a > while. > > > > > > And an important question I think we need to answer before we publish > > > these: What happens to these packages once Airflow 2.0 is out? (Mostly > > > just how to we avoid any installation problems for our users in the > > > future. Whether these packages live on or not can be addressed in > AIP-8?) > > > > > > -ash > > > > > > > > > On Apr 20 2020, at 12:29 pm, Jarek Potiuk <[email protected]> > > > wrote: > > > > > > > Great stats Kamil :). I have not realized there is so big imbalance > > > > when it comes to the number of operators :). > > > > > > > > I fully agree 66% +sounds like great value. And the stats tell me > that > > > > maybe we are not that far away from testing everything :) > > > > > > > > J. > > > > > > > > On Mon, Apr 20, 2020 at 1:01 PM Kamil BreguĆa > > > > <[email protected]> wrote: > > > >> > > > >> Hello, > > > >> > > > >> Thanks Jarek, that you deal with this topic. It is very important > for > > > >> our users. Many users want to use new operators, but this is not > > > >> possible. > > > >> > > > >> In my opinion, we should not only look at the package name, but > their > > > >> content is more important. We should base our decisions on hard > data. > > > >> For this reason, I have prepared some statistics. I counted how > many > > > >> operators are in each package. > > > >> > > > >> 298 google > > > >> 49 amazon > > > >> 27 apache > > > >> 13 microsoft > > > >> 6 yandex > > > >> 6 qubole > > > >> 4 mysql > > > >> 3 slack > > > >> 3 redis > > > >> 3 jira > > > >> 3 cncf > > > >> 2 snowflake > > > >> 2 sftp > > > >> 2 salesforce > > > >> 2 oracle > > > >> 2 http > > > >> 2 ftp > > > >> 2 docker > > > >> 2 databricks > > > >> 1 vertica > > > >> 1 ssh > > > >> 1 sqlite > > > >> 1 singularity > > > >> 1 segment > > > >> 1 postgres > > > >> 1 papermill > > > >> 1 opsgenie > > > >> 1 mongo > > > >> 1 jenkins > > > >> 1 jdbc > > > >> 1 imap > > > >> 1 grpc > > > >> 1 exasol > > > >> 1 email > > > >> 1 discord > > > >> 1 dingding > > > >> 1 datadog > > > >> 1 celery > > > >> > > > >> So we have > > > >> 298 operators in google package (66% of total) > > > >> 152 operators in other packages > > > >> > > > >> Here is a list of all operators in Airflow master: > > > https://pastebin.com/GyARtGRC > > > >> To generate statistics I use the following command: > > > >> cat list-all.txt | grep providers | cut -d "." -f 3 | sort -n | > uniq > > > >> -c | sort -n -r > > > >> cat list-all.txt | grep providers | cut -d "." -f 3 | sort -n | > uniq > > > >> -c | sort -n -r | grep google | awk '{sum += $1} END {print sum}' > > > >> cat list-all.txt | grep providers | cut -d "." -f 3 | sort -n | > uniq > > > >> -c | sort -n -r | grep -v google | awk '{sum += $1} END {print sum}' > > > >> > > > >> Now we can ask another question - should we release packages with > 66+% > > > >> operators? If not, what percentage will be appropriate? > > > >> > > > >> In my opinion, we should release tested packages as soon as > possible. > > > >> This allows users to become better acquainted with this idea, and in > > > >> the long run, encourage more people to test other services as well. > > > >> > > > >> Some operators for Google services that are in Airflow 1.10 have > bugs > > > >> that make it difficult or impossible to use them. Many operators > have > > > >> also never been released in any Airflow 1.10 release Many users > write > > > >> to me who want to use Airflow 2.0 operators and I don't have good > news > > > >> for them. If I can't solve all the problems then I would like to be > > > >> able to solve the problem only for a few people, but don't stay in > one > > > >> place. Users expect that they will be able to use these operators > now, > > > >> so if there are no technical obstacles then we should do it as soon > as > > > >> possible. > > > >> > > > >> Best regards, > > > >> Kamil > > > >> > > > >> On Mon, Apr 20, 2020 at 10:06 AM Jarek Potiuk > > > >> <[email protected]> wrote: > > > >> > > > > >> > I would like to focus this week on releasing backport packages. > And I > > > >> > would like to ask you for opinions on what should be the first > "bunch > > > >> > of packages" to release: > > > >> > > > > >> > The current status snapshot is here: > > > >> > > > > > https://cwiki.apache.org/confluence/display/AIRFLOW/Backported+providers+packages+for+Airflow+1.10.*+series > > > >> > > > > >> > We have a project in Github: > > > >> > https://github.com/apache/airflow/projects/2 where I keep the > status > > > >> > of the packages and if you drill down to issues you will see that > we > > > >> > have very well defined criteria for each of the packages to be > > > >> > "ready-to-release". > > > >> > > > > >> > I think adding system tests and actual testing is a slow process. > We > > > >> > completed it for "google" "Postgres" "MySQL" packages and I am > > > >> > planning to complete it for "HTTP" - possibly few simpler ones > like > > > >> > "sftp" "ssh" myself this week. We also need to re-test it for > 1.10.10 > > > >> > but since we have semi-automated system tests, it will be easy > and I > > > >> > might even be able to automate it with Github Actions. > > > >> > > > > >> > However, the two important ones "Microsoft" and "Amazon" are still > > > >> > quite far from completion (or even starting for "Microsoft"). > > > >> > > > > >> > I might try to engage more people to do the testing, but I think > there > > > >> > also might be a value in releasing some first packages so that > people > > > >> > start using them and maybe then this will be a bigger incentive > to do > > > >> > more testing and implement system tests for other packages. > > > >> > > > > >> > I think about two scenarios of release: > > > >> > > > > >> > 1) Google + postgres + mysql + http + ssh +sftp > > > >> > > > > >> > 2) Same as above but we wait for "amazon" "microsoft" to complete > > > >> > > > > >> > What do you think - should we release the first bunch of operators > > > >> > now? I personally think we should do that. > > > >> > > > > >> > J. > > > >> > > > > >> > > > > >> > > > > >> > -- > > > >> > Jarek Potiuk > > > >> > Polidea | Principal Software Engineer > > > >> > > > > >> > M: +48 660 796 129 > > > > > > > > > > > > > > > > -- > > > > > > > > Jarek Potiuk > > > > Polidea | Principal Software Engineer > > > > > > > > M: +48 660 796 129 > > > > > > > > > > > -- > > Tomasz Urbaszek > Polidea | Software Engineer > > M: +48 505 628 493 > E: [email protected] > > Unique Tech > Check out our projects! >
