Personally if those two I greatly prefer ExternalPythonOperator. (I didn't vote 
for either of those)

(Also I think PythonExternalEnvOperator would be the "correct" casing, 
Virtualenv is a thing in python, Externalenv isn't.)

-ash 

On 31 August 2022 21:28:20 BST, Jarek Potiuk <ja...@potiuk.com> wrote:
>We've got 56 votes (wow!)
>
>ExternalPythonOperator won. It got 41% . Followed by
>PythonExternalenvOperator 30% and PythonRunenvOperator with 26%.
>
>I am fine with either of those. But - despite slightly lower support - I
>think PythonExternalenvOperator reflects a bit better the resemblance to
>PythonVirtualenvOperator that I think is important.
>
>Asking those who were very strong on ExternalPythonOperator - is
>PythonExternalenvOperator "good enough" for you as well?
>
>The poll had only one option to choose from, but if that is an acceptable
>option for those who favoured "ExternalPythonOperator" - I have personally
>a slight preference for that one.
>
>J.
>
>
>
>
>On Wed, Aug 31, 2022 at 3:10 PM Jarek Potiuk <ja...@potiuk.com> wrote:
>
>> Just 5 hours left to change the world!
>>
>> You can become one of the people who influenced the decision on naming the
>> new operator :D
>>
>> https://twitter.com/jarekpotiuk/status/1563602012100767746
>>
>> (Right, maybe changing the world just a little, but still)
>>
>> J.
>>
>>
>> On Sat, Aug 27, 2022 at 9:01 PM Jarek Potiuk <ja...@potiuk.com> wrote:
>>
>>> Seems we are only now at the stage that we need to choose the best name
>>> for the operator
>>>
>>> I started a name poll on Twitter :)
>>>
>>> https://twitter.com/jarekpotiuk/status/1563602012100767746
>>>
>>> PR here: https://github.com/apache/airflow/pull/25780
>>>
>>> J.
>>>
>>>
>>>
>>> On Thu, Aug 18, 2022 at 1:53 AM Jarek Potiuk <ja...@potiuk.com> wrote:
>>>
>>>> Draft PR - needs some more tests and review with typing changes - in
>>>> https://github.com/apache/airflow/pull/25780
>>>> Eventually PythonExternalOperator seems like a good name.
>>>>
>>>> J.
>>>>
>>>>
>>>> On Wed, Aug 17, 2022 at 10:37 PM Jeambrun Pierre <pierrejb...@gmail.com>
>>>> wrote:
>>>>
>>>>> I also like the ability to use a specific interpreter.
>>>>>
>>>>> Maybe we could leave everything that is env related to the PVO (even
>>>>> using an existing one) and let another one handle the interpreter.
>>>>>
>>>>> As Ash mentioned I also feel like an additional parameter
>>>>> (python/interpreter etc.) to the PO would make sense and is quite 
>>>>> intuitive
>>>>> rather than a complete new operator, but it might be harder to implement.
>>>>>
>>>>> Best
>>>>> Pierre Jeambrun
>>>>>
>>>>> Le mer. 17 août 2022 à 20:46, Collin McNulty
>>>>> <col...@astronomer.io.invalid> a écrit :
>>>>>
>>>>>> I concur that this would be very useful. I can see a common pattern
>>>>>> being to have a task to create an environment if it does not already 
>>>>>> exist
>>>>>> and then subsequent tasks use that environment.
>>>>>>
>>>>>> On Wed, Aug 17, 2022 at 12:30 PM Jarek Potiuk <ja...@potiuk.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Sounds like this is really in the middle between PVO and PO :).
>>>>>>>
>>>>>>> BTW. I spoke with a customer of mine today and they said they would
>>>>>>> ABSOLUTELY love it. They were actually blocked from migrating to 2.3.3
>>>>>>> because one of their teams needed a DBT environment while the other
>>>>>>> team needed some other dependency and they are conflicting with each
>>>>>>> other. They are using Nomad + Docker already and while extending the
>>>>>>> image with another venv is super-easy for them, they were considering
>>>>>>> building several Docker images to serve their users but it is an order
>>>>>>> of magnitude more complex problem for them because they would have to
>>>>>>> make a whole new pipeline to build a distribute multiple images and
>>>>>>> implements queue-base split between the teams or switch to using
>>>>>>> DockerOperator.
>>>>>>>
>>>>>>> This one will allow them to do limited version of multi-tenancy for
>>>>>>> their teams - without the actual separation but with even more
>>>>>>> fine-grained separation of envs - because they would be able to use
>>>>>>> different deps even for different tasks in the same DAG.
>>>>>>>
>>>>>>>
>>>>>>> J,
>>>>>>>
>>>>>>> On Wed, Aug 17, 2022 at 6:21 PM Ash Berlin-Taylor <a...@apache.org>
>>>>>>> wrote:
>>>>>>> >
>>>>>>> > Another option would be to change the PythonOperator/@task to take
>>>>>>> a `python` argument (which also does change the behaviour of _that_
>>>>>>> operator a lot with or without that argument if we did that.)
>>>>>>> >
>>>>>>> > On 17 August 2022 15:46:52 BST, Jarek Potiuk <ja...@potiuk.com>
>>>>>>> wrote:
>>>>>>> >>
>>>>>>> >> Yeah. TP - I like that explicit separation. It's much cleaner. I
>>>>>>> still
>>>>>>> >> have to think about the name though. While I see where
>>>>>>> >> ExternalPythonOperator comes from,  It sounds a bit less than
>>>>>>> obvious.
>>>>>>> >> I think the name should somehow contain "Environment" because very
>>>>>>> few
>>>>>>> >> people realise that running Python from a virtualenv actually
>>>>>>> >> implicitly "activates" the venv.
>>>>>>> >> I think maybe deprecating the old PythonVirtualenvOperator and
>>>>>>> >> introducing two new operators: PythonInCreatedVirtualEnvOperator,
>>>>>>> >> PythonInExistingVirtualEnvOperator ? Not exactly those names - they
>>>>>>> >> are too long - but something like that. Maybe we should get rid of
>>>>>>> >> Python in the name at all ?
>>>>>>> >>
>>>>>>> >> BTW. I think we should generally do more of the discussions here
>>>>>>> and
>>>>>>> >> express our thoughts about Airflow here. Even if there are no
>>>>>>> answers
>>>>>>> >> or interest immediately, I think that it makes sense to do a bit
>>>>>>> of a
>>>>>>> >> melting pot that sometimes might produce some cool (or rather hot)
>>>>>>> >> stuff as a result.
>>>>>>> >>
>>>>>>> >> On Wed, Aug 17, 2022 at 8:45 AM Tzu-ping Chung
>>>>>>> <t...@astronomer.io.invalid> wrote:
>>>>>>> >>>
>>>>>>> >>>
>>>>>>> >>>  One thing I thought of (but never bothered to write about) is to
>>>>>>> introduce a separate operator instead, say ExternalPythonOperator (bike
>>>>>>> shedding on name is welcomed), that explicitly takes a path to the
>>>>>>> interpreter (say in a virtual environment) and just use that to run the
>>>>>>> code. This also enables users to create a virtual environment upfront, 
>>>>>>> but
>>>>>>> avoids needing to overload PythonVirtualenvOperator for the purpose. 
>>>>>>> This
>>>>>>> also opens an extra use case that you can use any Python installation to
>>>>>>> run the code (say a custom-compiled interpreter), although nobody asked
>>>>>>> about that.
>>>>>>> >>>
>>>>>>> >>>  TP
>>>>>>> >>>
>>>>>>> >>>
>>>>>>> >>>  On 13 Aug 2022, at 02:52, Jeambrun Pierre <pierrejb...@gmail.com>
>>>>>>> wrote:
>>>>>>> >>>
>>>>>>> >>>  I feel like this is a great alternative at the price of a very
>>>>>>> moderate effort. (I'd be glad to help with it).
>>>>>>> >>>
>>>>>>> >>>  Mutually exclusive sounds good to me as well.
>>>>>>> >>>
>>>>>>> >>>  Best,
>>>>>>> >>>  Pierre
>>>>>>> >>>
>>>>>>> >>>  Le ven. 12 août 2022 à 15:23, Jarek Potiuk <ja...@potiuk.com> a
>>>>>>> écrit :
>>>>>>> >>>>
>>>>>>> >>>>
>>>>>>> >>>>  Mutually exclusive. I think that has the nice property of
>>>>>>> forcing people to prepare immutable venvs upfront.
>>>>>>> >>>>
>>>>>>> >>>>  On Fri, Aug 12, 2022 at 3:15 PM Ash Berlin-Taylor <
>>>>>>> a...@apache.org> wrote:
>>>>>>> >>>>>
>>>>>>> >>>>>
>>>>>>> >>>>>  Yes, this has been on my background idea list for an age --
>>>>>>> I'd love to see it happen!
>>>>>>> >>>>>
>>>>>>> >>>>>  Have you thought about how it would behave when you specify an
>>>>>>> existing virtualenv and include requirements in the operator that are 
>>>>>>> not
>>>>>>> already installed there? Or would they be mutually exclusive? (I don't 
>>>>>>> mind
>>>>>>> either way, just wondering which way you are heading)
>>>>>>> >>>>>
>>>>>>> >>>>>  -ash
>>>>>>> >>>>>
>>>>>>> >>>>>  On Fri, Aug 12 2022 at 14:58:44 +02:00:00, Jarek Potiuk <
>>>>>>> ja...@potiuk.com> wrote:
>>>>>>> >>>>>
>>>>>>> >>>>>  Hello everyone,
>>>>>>> >>>>>
>>>>>>> >>>>>  TL;DR; I propose to extend our PythonVirtualenvOperator with
>>>>>>> "use existing venv" feature and make it a viable way of handling some
>>>>>>> multi-dependency sets using multiple pre-installed venvs.
>>>>>>> >>>>>
>>>>>>> >>>>>  More context:
>>>>>>> >>>>>
>>>>>>> >>>>>  I had this idea coming after a discussion in our Slack:
>>>>>>> https://apache-airflow.slack.com/archives/CCV3FV9KL/p1660233834355179
>>>>>>> >>>>>
>>>>>>> >>>>>  My thoughts were - why don't we add support for "use existing
>>>>>>> venv" in PythonVirtualenvOperator as first-class-citizen ?
>>>>>>> >>>>>
>>>>>>> >>>>>  Currently (unless there are some tricks I am not aware of) or
>>>>>>> extend PVO, the PVO will always attempt to create a virtualenv based on
>>>>>>> extra requirements. And while it gives the users a possibility of having
>>>>>>> some tasks use different dependencies, the drawback is that the venv is
>>>>>>> created dynamically when tasks starts - potentially a lot of overhead 
>>>>>>> for
>>>>>>> startup time and some unpleasant failure scenarios - like networking
>>>>>>> problems, PyPI or local repoi not available, automated (and unnoticed)
>>>>>>> upgrade of dependencies.
>>>>>>> >>>>>
>>>>>>> >>>>>  Those are basically the same problems that caused us to
>>>>>>> strongly discourage our users in our Helm Chart to use
>>>>>>> _PIP_ADDITIONAL_DEPENDENCIES in production and criticize the  Community
>>>>>>> Helm Chart for dynamic dependency installation they promote as a "valid"
>>>>>>> approach. Yet our PVO currently does exactly this.
>>>>>>> >>>>>
>>>>>>> >>>>>  We had some past discussions how this can be improved - with
>>>>>>> caching, or using different images for different dependencies and 
>>>>>>> similar -
>>>>>>> and even we have
>>>>>>> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-46+Runtime+isolation+for+airflow+tasks+and+dag+parsing
>>>>>>> proposal to use different images for different sets of requirements.
>>>>>>> >>>>>
>>>>>>> >>>>>  Proposal:
>>>>>>> >>>>>
>>>>>>> >>>>>  During the discussion yesterday I started to think a simpler
>>>>>>> solution is possible and rather simple to implement by us and for users 
>>>>>>> to
>>>>>>> use.
>>>>>>> >>>>>
>>>>>>> >>>>>  Why not have different venvs preinstalled and let the PVO
>>>>>>> choose the one that should be used?
>>>>>>> >>>>>
>>>>>>> >>>>>  It does not invalidate AIP-46. AIP-46 serves a bit different
>>>>>>> purpose and some cases cannot be handled this way - when you need 
>>>>>>> different
>>>>>>> "system level" dependencies for example) but it might be much simpler 
>>>>>>> from
>>>>>>> deployment point of view and allow it to handle "multi-dependency sets" 
>>>>>>> for
>>>>>>> Python libraries only with minimal deployment overhead (which AIP-46
>>>>>>> necessarily has). And I think it will be enough for a vast number of the
>>>>>>> "multi-dependency-sets" cases.
>>>>>>> >>>>>
>>>>>>> >>>>>  Why don't we allow the users to prepare those venvs upfront
>>>>>>> and simply enable PVE to use them rather than create them dynamically ?
>>>>>>> >>>>>
>>>>>>> >>>>>  Advantages:
>>>>>>> >>>>>
>>>>>>> >>>>>  * it nicely handles cases where some of your tasks need a
>>>>>>> different set of dependencies than others (for execution, not 
>>>>>>> necessarily
>>>>>>> parsing at least initially).
>>>>>>> >>>>>
>>>>>>> >>>>>  * no startup time overhead needed as with current PVO
>>>>>>> >>>>>
>>>>>>> >>>>>  * possible to run in both cases - "venv installation" and
>>>>>>> "docker image" installation
>>>>>>> >>>>>
>>>>>>> >>>>>  * it has finer granularity level than AIP-46 - unlike in
>>>>>>> AIP-46 you could use different sets of dependencies
>>>>>>> >>>>>
>>>>>>> >>>>>  * very easy to pull off for the users without modifying their
>>>>>>> deployments,For local venv, you just create the venvs, For Docker image
>>>>>>> case, your custom image needs to add several lines similar to:
>>>>>>> >>>>>
>>>>>>> >>>>>  RUN python -m venv --system-site-packages PACKAGE1==NN
>>>>>>> PACKAGE2==NN /opt/venv1
>>>>>>> >>>>>  RUN python -m venv --system-site-packages PACKAGE1==NN
>>>>>>> PACKAGE2==NN /opt/venv2
>>>>>>> >>>>>
>>>>>>> >>>>>  and PythonVenvOperator should have extra
>>>>>>> "use_existing_venv=/opt/venv2") parameter
>>>>>>> >>>>>
>>>>>>> >>>>>  * we only need to manage ONE image (!) even if you have
>>>>>>> multiple sets of dependencies (this has the advantage that it is 
>>>>>>> actually
>>>>>>> LOWER overhead than having separate images for each env -when it comes 
>>>>>>> to
>>>>>>> various resources overhead (same workers could handle multiple 
>>>>>>> dependency
>>>>>>> sets for examples, same image is reused by multiple PODs in K8S etc. ).
>>>>>>> >>>>>
>>>>>>> >>>>>  * later (when AIP-43 (separate dag processor with ability to
>>>>>>> use different processors for different subdirectories) is completed and
>>>>>>> AIP-46 is approved/implemented, we could also extend DAG Parsing to be 
>>>>>>> able
>>>>>>> to use those predefined venvs for parsing. That would eliminate the need
>>>>>>> for local imports and add support to even use different sets of 
>>>>>>> libraries
>>>>>>> in top-level code (per DAG, not per task). It would not solve different
>>>>>>> "system" level dependencies - and for that AiP-46 is still a very valid
>>>>>>> case.
>>>>>>> >>>>>
>>>>>>> >>>>>  Disadvantages:
>>>>>>> >>>>>
>>>>>>> >>>>>  I thought very hard about this one and I actually could not
>>>>>>> find any disadvantages :)
>>>>>>> >>>>>
>>>>>>> >>>>>  It's simple to implement, use and explain, it can be
>>>>>>> implemented very quickly (like - in a few hours with tests and
>>>>>>> documentation I think) and performance-wise it is better for any other
>>>>>>> solution (including AIP-46) providing that the case is limited to 
>>>>>>> different
>>>>>>> Python dependencies.
>>>>>>> >>>>>
>>>>>>> >>>>>  But possibly there are things that I missed. It all looks too
>>>>>>> good to be true, and I wonder why we do not have it already today - 
>>>>>>> once I
>>>>>>> thought about it, it seems very obvious. So I probably missed something.
>>>>>>> >>>>>
>>>>>>> >>>>>  WDYT?
>>>>>>> >>>>>
>>>>>>> >>>>>  J.
>>>>>>> >>>>>
>>>>>>> >>>>>
>>>>>>> >>>>>
>>>>>>> >>>>>
>>>>>>> >>>>>
>>>>>>> >>>>>
>>>>>>> >>>>>
>>>>>>> >>>
>>>>>>>
>>>>>> --
>>>>>>
>>>>>> Collin McNulty
>>>>>> Lead Airflow Engineer
>>>>>>
>>>>>> Email: col...@astronomer.io <john....@astronomer.io>
>>>>>> Time zone: US Central (CST UTC-6 / CDT UTC-5)
>>>>>>
>>>>>>
>>>>>> <https://www.astronomer.io/>
>>>>>>
>>>>>

Reply via email to