Fine :). Let it be then :)

On Thu, Sep 1, 2022 at 6:40 AM Abhishek Bhakat
<[email protected]> wrote:

> Would like to vote for ExternalPythonOperator.
> Cause usually Virtualenv have symbolic links for python binaries untill
> used —copies to make it fully portable.
> Additionally there is option to use differently compiled python altogether
> (For example pypy <https://www.pypy.org/index.html> or jython
> <https://www.jython.org/>). Naming these "External Pythons" makes more
> sense to me.
>
> Thanks,
> Abhishek
>
> On 31-Aug-2022 at 9:30:42 PM, Ash Berlin-Taylor <[email protected]> wrote:
>
>> Personally if those two I greatly prefer ExternalPythonOperator. (I
>> didn't vote for either of those)
>>
>> (Also I think PythonExternalEnvOperator would be the "correct" casing,
>> Virtualenv is a thing in python, Externalenv isn't.)
>>
>> -ash
>>
>> On 31 August 2022 21:28:20 BST, Jarek Potiuk <[email protected]> wrote:
>>>
>>> We've got 56 votes (wow!)
>>>
>>> ExternalPythonOperator won. It got 41% . Followed by
>>> PythonExternalenvOperator 30% and PythonRunenvOperator with 26%.
>>>
>>> I am fine with either of those. But - despite slightly lower support - I
>>> think PythonExternalenvOperator reflects a bit better the resemblance to
>>> PythonVirtualenvOperator that I think is important.
>>>
>>> Asking those who were very strong on ExternalPythonOperator - is
>>> PythonExternalenvOperator "good enough" for you as well?
>>>
>>> The poll had only one option to choose from, but if that is an
>>> acceptable option for those who favoured "ExternalPythonOperator" - I have
>>> personally a slight preference for that one.
>>>
>>> J.
>>>
>>>
>>>
>>>
>>> On Wed, Aug 31, 2022 at 3:10 PM Jarek Potiuk <[email protected]> wrote:
>>>
>>>> Just 5 hours left to change the world!
>>>>
>>>> You can become one of the people who influenced the decision on naming
>>>> the new operator :D
>>>>
>>>> https://twitter.com/jarekpotiuk/status/1563602012100767746
>>>>
>>>> (Right, maybe changing the world just a little, but still)
>>>>
>>>> J.
>>>>
>>>>
>>>> On Sat, Aug 27, 2022 at 9:01 PM Jarek Potiuk <[email protected]> wrote:
>>>>
>>>>> Seems we are only now at the stage that we need to choose the best
>>>>> name for the operator
>>>>>
>>>>> I started a name poll on Twitter :)
>>>>>
>>>>> https://twitter.com/jarekpotiuk/status/1563602012100767746
>>>>>
>>>>> PR here: https://github.com/apache/airflow/pull/25780
>>>>>
>>>>> J.
>>>>>
>>>>>
>>>>>
>>>>> On Thu, Aug 18, 2022 at 1:53 AM Jarek Potiuk <[email protected]> wrote:
>>>>>
>>>>>> Draft PR - needs some more tests and review with typing changes - in
>>>>>> https://github.com/apache/airflow/pull/25780
>>>>>> Eventually PythonExternalOperator seems like a good name.
>>>>>>
>>>>>> J.
>>>>>>
>>>>>>
>>>>>> On Wed, Aug 17, 2022 at 10:37 PM Jeambrun Pierre <
>>>>>> [email protected]> wrote:
>>>>>>
>>>>>>> I also like the ability to use a specific interpreter.
>>>>>>>
>>>>>>> Maybe we could leave everything that is env related to the PVO (even
>>>>>>> using an existing one) and let another one handle the interpreter.
>>>>>>>
>>>>>>> As Ash mentioned I also feel like an additional parameter
>>>>>>> (python/interpreter etc.) to the PO would make sense and is quite 
>>>>>>> intuitive
>>>>>>> rather than a complete new operator, but it might be harder to 
>>>>>>> implement.
>>>>>>>
>>>>>>> Best
>>>>>>> Pierre Jeambrun
>>>>>>>
>>>>>>> Le mer. 17 août 2022 à 20:46, Collin McNulty
>>>>>>> <[email protected]> a écrit :
>>>>>>>
>>>>>>>> I concur that this would be very useful. I can see a common pattern
>>>>>>>> being to have a task to create an environment if it does not already 
>>>>>>>> exist
>>>>>>>> and then subsequent tasks use that environment.
>>>>>>>>
>>>>>>>> On Wed, Aug 17, 2022 at 12:30 PM Jarek Potiuk <[email protected]>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Sounds like this is really in the middle between PVO and PO :).
>>>>>>>>>
>>>>>>>>> BTW. I spoke with a customer of mine today and they said they would
>>>>>>>>> ABSOLUTELY love it. They were actually blocked from migrating to
>>>>>>>>> 2.3.3
>>>>>>>>> because one of their teams needed a DBT environment while the other
>>>>>>>>> team needed some other dependency and they are conflicting with
>>>>>>>>> each
>>>>>>>>> other. They are using Nomad + Docker already and while extending
>>>>>>>>> the
>>>>>>>>> image with another venv is super-easy for them, they were
>>>>>>>>> considering
>>>>>>>>> building several Docker images to serve their users but it is an
>>>>>>>>> order
>>>>>>>>> of magnitude more complex problem for them because they would have
>>>>>>>>> to
>>>>>>>>> make a whole new pipeline to build a distribute multiple images and
>>>>>>>>> implements queue-base split between the teams or switch to using
>>>>>>>>> DockerOperator.
>>>>>>>>>
>>>>>>>>> This one will allow them to do limited version of multi-tenancy for
>>>>>>>>> their teams - without the actual separation but with even more
>>>>>>>>> fine-grained separation of envs - because they would be able to use
>>>>>>>>> different deps even for different tasks in the same DAG.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> J,
>>>>>>>>>
>>>>>>>>> On Wed, Aug 17, 2022 at 6:21 PM Ash Berlin-Taylor <[email protected]>
>>>>>>>>> wrote:
>>>>>>>>> >
>>>>>>>>> > Another option would be to change the PythonOperator/@task to
>>>>>>>>> take a `python` argument (which also does change the behaviour of 
>>>>>>>>> _that_
>>>>>>>>> operator a lot with or without that argument if we did that.)
>>>>>>>>> >
>>>>>>>>> > On 17 August 2022 15:46:52 BST, Jarek Potiuk <[email protected]>
>>>>>>>>> wrote:
>>>>>>>>> >>
>>>>>>>>> >> Yeah. TP - I like that explicit separation. It's much cleaner.
>>>>>>>>> I still
>>>>>>>>> >> have to think about the name though. While I see where
>>>>>>>>> >> ExternalPythonOperator comes from,  It sounds a bit less than
>>>>>>>>> obvious.
>>>>>>>>> >> I think the name should somehow contain "Environment" because
>>>>>>>>> very few
>>>>>>>>> >> people realise that running Python from a virtualenv actually
>>>>>>>>> >> implicitly "activates" the venv.
>>>>>>>>> >> I think maybe deprecating the old PythonVirtualenvOperator and
>>>>>>>>> >> introducing two new operators:
>>>>>>>>> PythonInCreatedVirtualEnvOperator,
>>>>>>>>> >> PythonInExistingVirtualEnvOperator ? Not exactly those names -
>>>>>>>>> they
>>>>>>>>> >> are too long - but something like that. Maybe we should get rid
>>>>>>>>> of
>>>>>>>>> >> Python in the name at all ?
>>>>>>>>> >>
>>>>>>>>> >> BTW. I think we should generally do more of the discussions
>>>>>>>>> here and
>>>>>>>>> >> express our thoughts about Airflow here. Even if there are no
>>>>>>>>> answers
>>>>>>>>> >> or interest immediately, I think that it makes sense to do a
>>>>>>>>> bit of a
>>>>>>>>> >> melting pot that sometimes might produce some cool (or rather
>>>>>>>>> hot)
>>>>>>>>> >> stuff as a result.
>>>>>>>>> >>
>>>>>>>>> >> On Wed, Aug 17, 2022 at 8:45 AM Tzu-ping Chung
>>>>>>>>> <[email protected]> wrote:
>>>>>>>>> >>>
>>>>>>>>> >>>
>>>>>>>>> >>>  One thing I thought of (but never bothered to write about) is
>>>>>>>>> to introduce a separate operator instead, say ExternalPythonOperator 
>>>>>>>>> (bike
>>>>>>>>> shedding on name is welcomed), that explicitly takes a path to the
>>>>>>>>> interpreter (say in a virtual environment) and just use that to run 
>>>>>>>>> the
>>>>>>>>> code. This also enables users to create a virtual environment 
>>>>>>>>> upfront, but
>>>>>>>>> avoids needing to overload PythonVirtualenvOperator for the purpose. 
>>>>>>>>> This
>>>>>>>>> also opens an extra use case that you can use any Python installation 
>>>>>>>>> to
>>>>>>>>> run the code (say a custom-compiled interpreter), although nobody 
>>>>>>>>> asked
>>>>>>>>> about that.
>>>>>>>>> >>>
>>>>>>>>> >>>  TP
>>>>>>>>> >>>
>>>>>>>>> >>>
>>>>>>>>> >>>  On 13 Aug 2022, at 02:52, Jeambrun Pierre <
>>>>>>>>> [email protected]> wrote:
>>>>>>>>> >>>
>>>>>>>>> >>>  I feel like this is a great alternative at the price of a
>>>>>>>>> very moderate effort. (I'd be glad to help with it).
>>>>>>>>> >>>
>>>>>>>>> >>>  Mutually exclusive sounds good to me as well.
>>>>>>>>> >>>
>>>>>>>>> >>>  Best,
>>>>>>>>> >>>  Pierre
>>>>>>>>> >>>
>>>>>>>>> >>>  Le ven. 12 août 2022 à 15:23, Jarek Potiuk <[email protected]>
>>>>>>>>> a écrit :
>>>>>>>>> >>>>
>>>>>>>>> >>>>
>>>>>>>>> >>>>  Mutually exclusive. I think that has the nice property of
>>>>>>>>> forcing people to prepare immutable venvs upfront.
>>>>>>>>> >>>>
>>>>>>>>> >>>>  On Fri, Aug 12, 2022 at 3:15 PM Ash Berlin-Taylor <
>>>>>>>>> [email protected]> wrote:
>>>>>>>>> >>>>>
>>>>>>>>> >>>>>
>>>>>>>>> >>>>>  Yes, this has been on my background idea list for an age --
>>>>>>>>> I'd love to see it happen!
>>>>>>>>> >>>>>
>>>>>>>>> >>>>>  Have you thought about how it would behave when you specify
>>>>>>>>> an existing virtualenv and include requirements in the operator that 
>>>>>>>>> are
>>>>>>>>> not already installed there? Or would they be mutually exclusive? (I 
>>>>>>>>> don't
>>>>>>>>> mind either way, just wondering which way you are heading)
>>>>>>>>> >>>>>
>>>>>>>>> >>>>>  -ash
>>>>>>>>> >>>>>
>>>>>>>>> >>>>>  On Fri, Aug 12 2022 at 14:58:44 +02:00:00, Jarek Potiuk <
>>>>>>>>> [email protected]> wrote:
>>>>>>>>> >>>>>
>>>>>>>>> >>>>>  Hello everyone,
>>>>>>>>> >>>>>
>>>>>>>>> >>>>>  TL;DR; I propose to extend our PythonVirtualenvOperator
>>>>>>>>> with "use existing venv" feature and make it a viable way of handling 
>>>>>>>>> some
>>>>>>>>> multi-dependency sets using multiple pre-installed venvs.
>>>>>>>>> >>>>>
>>>>>>>>> >>>>>  More context:
>>>>>>>>> >>>>>
>>>>>>>>> >>>>>  I had this idea coming after a discussion in our Slack:
>>>>>>>>> https://apache-airflow.slack.com/archives/CCV3FV9KL/p1660233834355179
>>>>>>>>> >>>>>
>>>>>>>>> >>>>>  My thoughts were - why don't we add support for "use
>>>>>>>>> existing venv" in PythonVirtualenvOperator as first-class-citizen ?
>>>>>>>>> >>>>>
>>>>>>>>> >>>>>  Currently (unless there are some tricks I am not aware of)
>>>>>>>>> or extend PVO, the PVO will always attempt to create a virtualenv 
>>>>>>>>> based on
>>>>>>>>> extra requirements. And while it gives the users a possibility of 
>>>>>>>>> having
>>>>>>>>> some tasks use different dependencies, the drawback is that the venv 
>>>>>>>>> is
>>>>>>>>> created dynamically when tasks starts - potentially a lot of overhead 
>>>>>>>>> for
>>>>>>>>> startup time and some unpleasant failure scenarios - like networking
>>>>>>>>> problems, PyPI or local repoi not available, automated (and unnoticed)
>>>>>>>>> upgrade of dependencies.
>>>>>>>>> >>>>>
>>>>>>>>> >>>>>  Those are basically the same problems that caused us to
>>>>>>>>> strongly discourage our users in our Helm Chart to use
>>>>>>>>> _PIP_ADDITIONAL_DEPENDENCIES in production and criticize the  
>>>>>>>>> Community
>>>>>>>>> Helm Chart for dynamic dependency installation they promote as a 
>>>>>>>>> "valid"
>>>>>>>>> approach. Yet our PVO currently does exactly this.
>>>>>>>>> >>>>>
>>>>>>>>> >>>>>  We had some past discussions how this can be improved -
>>>>>>>>> with caching, or using different images for different dependencies and
>>>>>>>>> similar - and even we have
>>>>>>>>> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-46+Runtime+isolation+for+airflow+tasks+and+dag+parsing
>>>>>>>>> proposal to use different images for different sets of requirements.
>>>>>>>>> >>>>>
>>>>>>>>> >>>>>  Proposal:
>>>>>>>>> >>>>>
>>>>>>>>> >>>>>  During the discussion yesterday I started to think a
>>>>>>>>> simpler solution is possible and rather simple to implement by us and 
>>>>>>>>> for
>>>>>>>>> users to use.
>>>>>>>>> >>>>>
>>>>>>>>> >>>>>  Why not have different venvs preinstalled and let the PVO
>>>>>>>>> choose the one that should be used?
>>>>>>>>> >>>>>
>>>>>>>>> >>>>>  It does not invalidate AIP-46. AIP-46 serves a bit
>>>>>>>>> different purpose and some cases cannot be handled this way - when 
>>>>>>>>> you need
>>>>>>>>> different "system level" dependencies for example) but it might be 
>>>>>>>>> much
>>>>>>>>> simpler from deployment point of view and allow it to handle
>>>>>>>>> "multi-dependency sets" for Python libraries only with minimal 
>>>>>>>>> deployment
>>>>>>>>> overhead (which AIP-46 necessarily has). And I think it will be 
>>>>>>>>> enough for
>>>>>>>>> a vast number of the "multi-dependency-sets" cases.
>>>>>>>>> >>>>>
>>>>>>>>> >>>>>  Why don't we allow the users to prepare those venvs upfront
>>>>>>>>> and simply enable PVE to use them rather than create them dynamically 
>>>>>>>>> ?
>>>>>>>>> >>>>>
>>>>>>>>> >>>>>  Advantages:
>>>>>>>>> >>>>>
>>>>>>>>> >>>>>  * it nicely handles cases where some of your tasks need a
>>>>>>>>> different set of dependencies than others (for execution, not 
>>>>>>>>> necessarily
>>>>>>>>> parsing at least initially).
>>>>>>>>> >>>>>
>>>>>>>>> >>>>>  * no startup time overhead needed as with current PVO
>>>>>>>>> >>>>>
>>>>>>>>> >>>>>  * possible to run in both cases - "venv installation" and
>>>>>>>>> "docker image" installation
>>>>>>>>> >>>>>
>>>>>>>>> >>>>>  * it has finer granularity level than AIP-46 - unlike in
>>>>>>>>> AIP-46 you could use different sets of dependencies
>>>>>>>>> >>>>>
>>>>>>>>> >>>>>  * very easy to pull off for the users without modifying
>>>>>>>>> their deployments,For local venv, you just create the venvs, For 
>>>>>>>>> Docker
>>>>>>>>> image case, your custom image needs to add several lines similar to:
>>>>>>>>> >>>>>
>>>>>>>>> >>>>>  RUN python -m venv --system-site-packages PACKAGE1==NN
>>>>>>>>> PACKAGE2==NN /opt/venv1
>>>>>>>>> >>>>>  RUN python -m venv --system-site-packages PACKAGE1==NN
>>>>>>>>> PACKAGE2==NN /opt/venv2
>>>>>>>>> >>>>>
>>>>>>>>> >>>>>  and PythonVenvOperator should have extra
>>>>>>>>> "use_existing_venv=/opt/venv2") parameter
>>>>>>>>> >>>>>
>>>>>>>>> >>>>>  * we only need to manage ONE image (!) even if you have
>>>>>>>>> multiple sets of dependencies (this has the advantage that it is 
>>>>>>>>> actually
>>>>>>>>> LOWER overhead than having separate images for each env -when it 
>>>>>>>>> comes to
>>>>>>>>> various resources overhead (same workers could handle multiple 
>>>>>>>>> dependency
>>>>>>>>> sets for examples, same image is reused by multiple PODs in K8S etc. 
>>>>>>>>> ).
>>>>>>>>> >>>>>
>>>>>>>>> >>>>>  * later (when AIP-43 (separate dag processor with ability
>>>>>>>>> to use different processors for different subdirectories) is 
>>>>>>>>> completed and
>>>>>>>>> AIP-46 is approved/implemented, we could also extend DAG Parsing to 
>>>>>>>>> be able
>>>>>>>>> to use those predefined venvs for parsing. That would eliminate the 
>>>>>>>>> need
>>>>>>>>> for local imports and add support to even use different sets of 
>>>>>>>>> libraries
>>>>>>>>> in top-level code (per DAG, not per task). It would not solve 
>>>>>>>>> different
>>>>>>>>> "system" level dependencies - and for that AiP-46 is still a very 
>>>>>>>>> valid
>>>>>>>>> case.
>>>>>>>>> >>>>>
>>>>>>>>> >>>>>  Disadvantages:
>>>>>>>>> >>>>>
>>>>>>>>> >>>>>  I thought very hard about this one and I actually could not
>>>>>>>>> find any disadvantages :)
>>>>>>>>> >>>>>
>>>>>>>>> >>>>>  It's simple to implement, use and explain, it can be
>>>>>>>>> implemented very quickly (like - in a few hours with tests and
>>>>>>>>> documentation I think) and performance-wise it is better for any other
>>>>>>>>> solution (including AIP-46) providing that the case is limited to 
>>>>>>>>> different
>>>>>>>>> Python dependencies.
>>>>>>>>> >>>>>
>>>>>>>>> >>>>>  But possibly there are things that I missed. It all looks
>>>>>>>>> too good to be true, and I wonder why we do not have it already today 
>>>>>>>>> -
>>>>>>>>> once I thought about it, it seems very obvious. So I probably missed
>>>>>>>>> something.
>>>>>>>>> >>>>>
>>>>>>>>> >>>>>  WDYT?
>>>>>>>>> >>>>>
>>>>>>>>> >>>>>  J.
>>>>>>>>> >>>>>
>>>>>>>>> >>>>>
>>>>>>>>> >>>>>
>>>>>>>>> >>>>>
>>>>>>>>> >>>>>
>>>>>>>>> >>>>>
>>>>>>>>> >>>>>
>>>>>>>>> >>>
>>>>>>>>>
>>>>>>>> --
>>>>>>>>
>>>>>>>> Collin McNulty
>>>>>>>> Lead Airflow Engineer
>>>>>>>>
>>>>>>>> Email: [email protected] <[email protected]>
>>>>>>>> Time zone: US Central (CST UTC-6 / CDT UTC-5)
>>>>>>>>
>>>>>>>>
>>>>>>>> <https://www.astronomer.io/>
>>>>>>>>
>>>>>>>

Reply via email to