Thanks a lot Jarek, will do!

On Wed, 10 Nov 2021, 13:40 Jarek Potiuk, <[email protected]> wrote:

> Merged!  Please rebase (Khalid- you can remove the workaround of yours)
> and let me know.
>
> There is one failure that happened in my tests:
>
> https://github.com/apache/airflow/runs/4165358689?check_suite_focus=true
> - but we can observe results of this one and try to find the reason
> separately if it continues to repeat.
>
> J.
>
> On Wed, Nov 10, 2021 at 12:49 PM Jarek Potiuk <[email protected]> wrote:
>
>> Fix being tested in: https://github.com/apache/airflow/pull/19512
>> (committer PR) and https://github.com/apache/airflow/pull/19514 (regular
>> user PR).
>>
>>
>> On Wed, Nov 10, 2021 at 11:25 AM Jarek Potiuk <[email protected]> wrote:
>>
>>> OK. I took a look . It looks like indeed "core" tests" are briefly (and
>>> sometimes for  a longer time) pass over 50% of memory available on Github
>>> Runners. I do not think optimizing them now makes little sense - because
>>> even if we optimize them now, they will likely soon again reach 50-60% of
>>> available memory, which - when ther are other parallel tests running might
>>> easily get OOM.
>>>
>>> It looks like those are only "Core" type of tests so the solution will
>>> be (similarly as with "Integration" tests) to separate them out to a
>>> non-parallel run for github runners.
>>>
>>> On Tue, Nov 9, 2021 at 9:33 PM Jarek Potiuk <[email protected]> wrote:
>>>
>>>> Yep. Apparently one of the recent tests is using too much memory. I had
>>>> some private errands that made me less available last few days - but I will
>>>> have time to catch-up tonight/tomorrow.
>>>>
>>>> Thanks for changing the "parallel" level in your PR - that will give me
>>>> more datapoints. I've just re-run both PRs with "debug-ci-resources" label.
>>>> This is our "debug" label to show resource use during the build and i might
>>>> be able to find and fix the root cause.
>>>>
>>>> For the future - in case any other committer wants to investigate it,
>>>> setting the "debug-ci-resources" labels turns on the debugging mode showing
>>>> this information periodically  alongside the progress of tests - it can be
>>>> helpful in determining what caused the OOM:
>>>>
>>>> CONTAINER ID   NAME                                            CPU %
>>>>   MEM USAGE / LIMIT     MEM %     NET I/O           BLOCK I/O         PIDS
>>>> c46832148ff7   airflow-always-mssql_airflow_run_e59b6039c3d8   99.59%
>>>>  365.1MiB / 6.789GiB   5.25%     1.62MB / 3.33MB   8.97MB / 20.5kB   8
>>>> f4d2a192d6fc   airflow-always-mssql_mssqlsetup_1               0.00%
>>>>   0B / 0B               0.00%     0B / 0B           0B / 0B           0
>>>> a668cdedc717   airflow-api-mssql_airflow_run_bcc466077ac0      35.07%
>>>>  431.4MiB / 6.789GiB   6.21%     2.26MB / 4.47MB   73.2MB / 20.5kB   8
>>>> f306f4221ba1   airflow-api-mssql_mssqlsetup_1                  0.00%
>>>>   0B / 0B               0.00%     0B / 0B           0B / 0B           0
>>>> 7f10748e9496   airflow-api-mssql_mssql_1                       30.66%
>>>>  735.5MiB / 6.789GiB   10.58%    4.47MB / 2.26MB   36.8MB / 124MB    132
>>>> 8b5ca767ed0c   airflow-always-mssql_mssql_1                    12.59%
>>>>  716.5MiB / 6.789GiB   10.31%    3.33MB / 1.63MB   36.7MB / 52.7MB   131
>>>>
>>>>               total        used        free      shared  buff/cache
>>>> available
>>>> Mem:           6951        2939         200           6        3811
>>>>    3702
>>>> Swap:             0           0           0
>>>>
>>>> Filesystem      Size  Used Avail Use% Mounted on
>>>> /dev/root        84G   51G   33G  61% /
>>>> /dev/sda15      105M  5.2M  100M   5% /boot/efi
>>>> /dev/sdb1        14G  4.1G  9.0G  32% /mnt
>>>>
>>>> J.
>>>>
>>>>
>>>> On Tue, Nov 9, 2021 at 9:19 PM Oliveira, Niko
>>>> <[email protected]> wrote:
>>>>
>>>>> Hey all,
>>>>>
>>>>>
>>>>> Just to throw another data point in the ring, I've had a PR
>>>>> <https://github.com/apache/airflow/pull/19410> stuck in the same way
>>>>> as well. Several retries are all failing with the same OOM.
>>>>>
>>>>>
>>>>> I've also dug through the Github Actions history and found a few
>>>>> others. So it doesn't seem to be just a one-off.
>>>>>
>>>>>
>>>>> Cheers,
>>>>> Niko
>>>>> ------------------------------
>>>>> *From:* Khalid Mammadov <[email protected]>
>>>>> *Sent:* Tuesday, November 9, 2021 6:24 AM
>>>>> *To:* [email protected]
>>>>> *Subject:* [EXTERNAL] OOM issue in the CI
>>>>>
>>>>>
>>>>> *CAUTION*: This email originated from outside of the organization. Do
>>>>> not click links or open attachments unless you can confirm the sender and
>>>>> know the content is safe.
>>>>>
>>>>> Hi Devs,
>>>>>
>>>>> I have been working on below PR for and run into OOM issue during
>>>>> testing on GitHub actions (you can see in commit history).
>>>>>
>>>>> https://github.com/apache/airflow/pull/19139/files
>>>>>
>>>>> The tests for databases Postgres, MySQL etc. fails due to OOM and
>>>>> docker gets killed.
>>>>>
>>>>> I have reduced parallelism to 1 "in the code" *temporarily* (the only
>>>>> extra change in the PR) and it passes all the checks which confirms the
>>>>> issue.
>>>>>
>>>>>
>>>>> I was hoping if you could advise the best course of action in this
>>>>> situation so I can force parallelism to 1 to get all checks passed or some
>>>>> other way to solve OOM?
>>>>>
>>>>> Any help would be appreciated.
>>>>>
>>>>>
>>>>> Thanks in advance
>>>>>
>>>>> Khalid
>>>>>
>>>>

Reply via email to