One case popped up for us recently, where it made sense to make a MsSql *From*S3Operator .
I think using "source" makes sense in general, but in this case calling this a S3ToMsSqlOperator and putting it under AWS seems silly, even though you could say s3 is "source" here. I think in most of these cases we say "let's use source" because source is where the actual work is done and destination is just storage. Does a guideline saying "ignore storage" or "storage is secondary in object location" make sense? On Fri, Oct 4, 2019 at 6:42 AM Jarek Potiuk <jarek.pot...@polidea.com> wrote: > It looks like we have general consensus about putting transfer operators > into "source provider" package. > That's great for me as well. > > Since I will be updating AIP-21 to reflect the "google" vs. "gcp" case, I > will also update it to add this decision. > > If no-one objects (Lazy Consensus > <https://community.apache.org/committers/lazyConsensus.html>) till > Monday7th of October, 3.20 CEST, we will update AIP-21 with information > that transfer operators should be placed in the "source" provider module. > > J. > > On Tue, Sep 24, 2019 at 1:34 PM Kamil Breguła <kamil.breg...@polidea.com> > wrote: > > > On Mon, Sep 23, 2019 at 7:42 PM Chris Palmer <ch...@crpalmer.com> wrote: > > > > > > On Mon, Sep 23, 2019 at 1:22 PM Kamil Breguła < > kamil.breg...@polidea.com > > > > > > wrote: > > > > > > > On Mon, Sep 23, 2019 at 7:04 PM Chris Palmer <ch...@crpalmer.com> > > wrote: > > > > > > > > > > Is there a reason why we can't use symlinks to have copies of the > > files > > > > > show up in both subpackages? So that `gcs_to_s3.py` would be under > > both > > > > > `aws/operators/` and `gcp/operators`. I could imagine there may be > > > > > technical reasons why this is a bad idea, but just thought I would > > ask. > > > > > > > > > Symlinks is not supported by git. > > > > > > > > > > > Why do you say that? This blog post > > > <https://www.mokacoding.com/blog/symliks-in-git/> details how you can > > use > > > them, and the caveats with regards to needing relative links not > > absolute. > > > The example repo he links to at the end includes a symlink which worked > > > fine for me when I cloned it. But maybe not relevant given the below: > > > > We still have to check if python packages can have links, but I'm > > afraid of this mechanism. This is not popular and may cause unexpected > > consequences. > > > > > > > > > Likewise, someone who spends 99% of their time working in AWS and > > using > > > > all > > > > > the operators in that subpackage, might not think to look in the > GCP > > > > > package the first time they need a GCS to S3 operator. I'm > admittedly > > > > > terrible at documentation, but if duplicating the files via > symlinks > > > > isn't > > > > > an option, then is there an easy way we could duplicate the > > documentation > > > > > for those operators so they are easily findable in both doc > sections? > > > > > > > > > > > > > Recently, I updated the documentation: > > > > https://airflow.readthedocs.io/en/latest/integration.html > > > > We have list of all integration in AWS, Azure, GCP. If the operator > > > > concerns two cloud proivders, it repeats in two places. It's good for > > > > documentation. DRY rule is only valid for source code. > > > > I am working on documentation for other operators. > > > > My work is part of this ticket: > > > > https://issues.apache.org/jira/browse/AIRFLOW-5431 > > > > > > > > > > > This updated documentation looks great, definitely heading in a > direction > > > that makes it easier and addresses my concerns. (Although it took me a > > > while to realize those tables can be scrolled horizontally!). > > > > > I'm working on redesign of documentation theme. It's part of AIP-11 > > > > > https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-11+Create+a+Landing+Page+for+Apache+Airflow > > We are currently at the stage of collecting comments from the first > > phase - we sent materials to the community, but also conducted tests > > with real users > > > > > https://lists.apache.org/thread.html/6fa1cdceb97ed17752978a8d4202bf1ff1a86c6b50bbc9d09f694166@%3Cdev.airflow.apache.org%3E > > > > > -- > > Jarek Potiuk > Polidea <https://www.polidea.com/> | Principal Software Engineer > > M: +48 660 796 129 <+48660796129> > [image: Polidea] <https://www.polidea.com/> >