And yet another update: - after seeing how it works I will remove
requirement generation from pre-commit - now that it needs to be
generated separately for different versions of python it's a bit too
much overhead (you'd need to have more images downloaded for different
python versions for pre-commit). Instaad I will add breeze commands to
re-ggenerate the requirements (and bash scripts if you do not use
breeze), and anyone changing setup.py will have to do it (otherwise CI
builds will fail). I think this workflow will be great to keep our
requirements up-to-date and have a stable installation method.

J.


On Mon, Mar 23, 2020 at 5:54 PM Jarek Potiuk <jarek.pot...@polidea.com> wrote:
>
> Update - It seems that we won't need the -pinned version eventually. I 
> realized that we need to have slightly different requirements for different 
> python versions.
>
> I just added PR for that: https://github.com/apache/airflow/pull/7841
>
> I also found out (during production image exercise) that we can install 
> airflow predictably in a very simple way (once we release the requirements in 
> 1.10.10):
>
> pip install apache-airflow[gcp]==1.10.10  --constraint 
> https://raw.githubusercontent.com/apache/airflow/1.10.10/requirements/requirements-python3.7.txt
>
> I think this is simple enough to be used as installation method. I added it 
> to the documentation and I think I am ok with dropping -pinned package 
> altogether.
>
> J.
>
>
> On Sun, Mar 22, 2020 at 10:15 AM Jarek Potiuk <jarek.pot...@polidea.com> 
> wrote:
>>
>> Yesterday we had another master breakage - this time from elasticsearch 
>> releasing MINOR version 7.6 breaking our builds (not it was MINOR version so 
>> should be compatible .... it was not for us). I fixed it quickly yesterday 
>> by limiting it to < 7.6 but for me - this is quite clear that trying to rely 
>> on SemVer being followed by others is a futile effort (at least in python's 
>> world).
>>
>> The theory is nice, but it breaks in practice. And it's not really a fault 
>> of the library maintainers. It's simply sometimes not so easy to see how 
>> your APIs are used - and in Python, you cannot prevent using stuff that you 
>> think is an internal detail. This is what happened in elasticsearch case 
>> yesterday - apparently, our plugin was using an "internal" API unknowingly 
>> and some parameters from that API were dropped during refactoring of 
>> elasticsearch library.
>>
>> My observation (it's anecdotal though) is that the COVID-19 situation made 
>> people have more time, fewer distractions, fewer things to do, and we have 
>> higher frequency of OSS packages being released recently so we should 
>> protect a bit from more often breakages.
>>
>> I think  learning from yesterday is:
>>
>> * we should merge the requirements.txt solution quickly to prevent further 
>> breakages (I am reading and testing it now) - I think everyone agrees it's 
>> good to have it
>> * I think we can continue discussing whether apache-airflow-pinned package 
>> should be released or not. I can leave the code building the package but we 
>> can decide about it after some more discussion
>>
>> Does it sound good?
>>
>> J
>>
>>
>>
>> On Fri, Mar 20, 2020 at 2:47 PM Jarek Potiuk <jarek.pot...@polidea.com> 
>> wrote:
>>>
>>> And rebased it right now and fixed automated requirements update.
>>>
>>> On Fri, Mar 20, 2020 at 2:28 PM Jarek Potiuk <jarek.pot...@polidea.com> 
>>> wrote:
>>>>
>>>> Ah BTW. I just noticed that for some reason I pasted an old PR earlier in 
>>>> the thread :(.
>>>> This is the one with requirements.txt I am talking about: 
>>>> https://github.com/apache/airflow/pull/7730
>>>>
>>>> On Fri, Mar 20, 2020 at 2:26 PM Jarek Potiuk <jarek.pot...@polidea.com> 
>>>> wrote:
>>>>>
>>>>> Nope. Not blocking. I can work with my branch just requirements.txt is 
>>>>> enough for that :)
>>>>>
>>>>> I think the problem with semver is that it is loosely followed - we had a 
>>>>> number of breakages in the past with minor version upgrades :(.
>>>>>
>>>>> J.
>>>>>
>>>>>
>>>>>
>>>>> On Fri, Mar 20, 2020 at 1:27 PM Kaxil Naik <kaxiln...@gmail.com> wrote:
>>>>>>
>>>>>> Thanks for the detailed explanation Jarek.
>>>>>>
>>>>>> How about we have an upper limit for all our dependencies, example 
>>>>>> instead
>>>>>> of "google-cloud-storage>=1.16", we have 
>>>>>> "google-cloud-storage>=1.16,<2.0" ?
>>>>>>
>>>>>> If a dependency breaks compatibility in minor versions, we can't do
>>>>>> anything about it but if they follow SemVer, we should be safe and the
>>>>>> first-time installers would have a non-breaking package. WDYT?
>>>>>>
>>>>>> Btw I hope this is not blocking you in building a production image as I
>>>>>> think requirements.txt is solving that? Please let me know if it is
>>>>>> blocking.
>>>>>>
>>>>>> PS: I am also just dumping my ideas to solve this issue. Love to hear 
>>>>>> what
>>>>>> others think too.
>>>>>>
>>>>>> Regards,
>>>>>> Kaxil
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Thu, Mar 19, 2020 at 2:43 PM Jarek Potiuk <jarek.pot...@polidea.com>
>>>>>> wrote:
>>>>>>
>>>>>> > I think we have similar understanding. But let me just clarify because 
>>>>>> > I
>>>>>> > think we think about we think about solving two different problems
>>>>>> > My proposal is not solving all problems with dependencies - quite the
>>>>>> > contrary, I want to solve just one specific "repeatability" problem - 
>>>>>> > read
>>>>>> > on :)..
>>>>>> >
>>>>>> >    1. A potential source of confusion: using "-pinned" for 
>>>>>> > installation but
>>>>>> > >    using "non-pinned" for DAG development.
>>>>>> > >
>>>>>> >
>>>>>> > This could be confusing indeed - but they are the same in fact -
>>>>>> > just deps might be different over time.
>>>>>> >
>>>>>> >    2. Most of the users would still try to install "apache-airflow" 
>>>>>> > package
>>>>>> > >    that might have been broken for example because of a dependency
>>>>>> > release,
>>>>>> > >    either way, we would still have to suggest them to use "pinned"
>>>>>> > version
>>>>>> > >
>>>>>> >
>>>>>> > True.  I thought we might describe it in the README and make it 
>>>>>> > prominently
>>>>>> > explained. Usually people look at the readme in PyPI when they are
>>>>>> > installing
>>>>>> > stuff and it does not work: https://pypi.org/project/apache-airflow/.
>>>>>> >
>>>>>> > Also - we could of course explain how to use requirements.txt from the
>>>>>> > released
>>>>>> > version when they are installing it. That would be an extra friction 
>>>>>> > point
>>>>>> > though
>>>>>> > and maybe having "always installable" version of airflow is a better
>>>>>> > choice.
>>>>>> >
>>>>>> >    3. If they install "pinned" version, it is no longer a library 
>>>>>> > again,
>>>>>> > >    that is users won't be able to use new NumPy release or 
>>>>>> > > matplotlib for
>>>>>> > >    example. In which case we are just circling back to the same 
>>>>>> > > problem,
>>>>>> > >    "either we risk broken package" while releasing or we risk 
>>>>>> > > potentially
>>>>>> > >    incompatible versions.
>>>>>> > >
>>>>>> >
>>>>>> > Yep. But maybe it's just a question of naming. Maybe even we could name
>>>>>> > this package differently to indicate that this version is a way to 
>>>>>> > quickly
>>>>>> > install
>>>>>> >  airflow but not to do any serious development with it.
>>>>>> >
>>>>>> > So speaking about THE problem I want to solve with the
>>>>>> > requirements.txt and apache-airflow-pinned package:
>>>>>> >
>>>>>> > I really only want to solve "first-time-user" experience here - nothing
>>>>>> > more. I
>>>>>> > definitely do not want to replace the current installation method for
>>>>>> > experienced
>>>>>> > users - for them using --constraint requirements.txt is exactly what 
>>>>>> > they
>>>>>> > need.
>>>>>> > The only problem I am trying to solve with that is "repeatability" of
>>>>>> > installation.
>>>>>> >
>>>>>> > Maybe "apache-airflow-quickinstall" or something like that would be 
>>>>>> > better
>>>>>> > than "apache-airflow-pinned" or "apache-airflow-repeatable-install" or
>>>>>> > something like that. I think about it as a "flavour" of ariflow rather 
>>>>>> > than
>>>>>> > anything else. I even originally implemented it as [pinned] extra 
>>>>>> > where I
>>>>>> > pinned all requirements. Unfortunately I found that if you have
>>>>>> > main requirement without limits, adding the same requirement as extra 
>>>>>> > with
>>>>>> > == does not make it pinned :(.  That was my original plan.
>>>>>> >
>>>>>> >
>>>>>> > > Btw I have been on "we should have pinned dependency" camp as Airflow
>>>>>> > > should definitely install without breaking since day-1 but I think a
>>>>>> > > separate "-pinned" package won't solve that issue.
>>>>>> > >
>>>>>> >
>>>>>> > Ah yeah we went the same route. I do not think we can solve the
>>>>>> > "library vs. app" problem easily. This is a bit of "eat-and-have-cake"
>>>>>> > at the same time. I know people have problems
>>>>>> > with conflicting dependencies when they are trying to install libraries
>>>>>> > with different requirements. And I am not even trying to solve that
>>>>>> > problem now. Not even close. This requires some other solution
>>>>>> > (for example separate virtualenvs with different dependencies
>>>>>> > build from wheels on per-task basis). But that's something much further
>>>>>> > in the future (if at all).
>>>>>> >
>>>>>> >
>>>>>> > >
>>>>>> > > WDYT? Also please do let me know if I have misunderstood something
>>>>>> > > (definitely possible :D).
>>>>>> > >
>>>>>> > > Regards,
>>>>>> > > Kaxil
>>>>>> >
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>>
>>>>> Jarek Potiuk
>>>>> Polidea | Principal Software Engineer
>>>>>
>>>>> M: +48 660 796 129
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>>
>>>> Jarek Potiuk
>>>> Polidea | Principal Software Engineer
>>>>
>>>> M: +48 660 796 129
>>>>
>>>>
>>>
>>>
>>> --
>>>
>>> Jarek Potiuk
>>> Polidea | Principal Software Engineer
>>>
>>> M: +48 660 796 129
>>>
>>>
>>
>>
>> --
>>
>> Jarek Potiuk
>> Polidea | Principal Software Engineer
>>
>> M: +48 660 796 129
>>
>>
>
>
> --
>
> Jarek Potiuk
> Polidea | Principal Software Engineer
>
> M: +48 660 796 129
>
>


-- 

Jarek Potiuk
Polidea | Principal Software Engineer

M: +48 660 796 129

Reply via email to