Yep. Apparently one of the recent tests is using too much memory. I had
some private errands that made me less available last few days - but I will
have time to catch-up tonight/tomorrow.

Thanks for changing the "parallel" level in your PR - that will give me
more datapoints. I've just re-run both PRs with "debug-ci-resources" label.
This is our "debug" label to show resource use during the build and i might
be able to find and fix the root cause.

For the future - in case any other committer wants to investigate it,
setting the "debug-ci-resources" labels turns on the debugging mode showing
this information periodically  alongside the progress of tests - it can be
helpful in determining what caused the OOM:

CONTAINER ID   NAME                                            CPU %
MEM USAGE / LIMIT     MEM %     NET I/O           BLOCK I/O         PIDS
c46832148ff7   airflow-always-mssql_airflow_run_e59b6039c3d8   99.59%
 365.1MiB / 6.789GiB   5.25%     1.62MB / 3.33MB   8.97MB / 20.5kB   8
f4d2a192d6fc   airflow-always-mssql_mssqlsetup_1               0.00%     0B
/ 0B               0.00%     0B / 0B           0B / 0B           0
a668cdedc717   airflow-api-mssql_airflow_run_bcc466077ac0      35.07%
 431.4MiB / 6.789GiB   6.21%     2.26MB / 4.47MB   73.2MB / 20.5kB   8
f306f4221ba1   airflow-api-mssql_mssqlsetup_1                  0.00%     0B
/ 0B               0.00%     0B / 0B           0B / 0B           0
7f10748e9496   airflow-api-mssql_mssql_1                       30.66%
 735.5MiB / 6.789GiB   10.58%    4.47MB / 2.26MB   36.8MB / 124MB    132
8b5ca767ed0c   airflow-always-mssql_mssql_1                    12.59%
 716.5MiB / 6.789GiB   10.31%    3.33MB / 1.63MB   36.7MB / 52.7MB   131

              total        used        free      shared  buff/cache
available
Mem:           6951        2939         200           6        3811
 3702
Swap:             0           0           0

Filesystem      Size  Used Avail Use% Mounted on
/dev/root        84G   51G   33G  61% /
/dev/sda15      105M  5.2M  100M   5% /boot/efi
/dev/sdb1        14G  4.1G  9.0G  32% /mnt

J.


On Tue, Nov 9, 2021 at 9:19 PM Oliveira, Niko <[email protected]>
wrote:

> Hey all,
>
>
> Just to throw another data point in the ring, I've had a PR
> <https://github.com/apache/airflow/pull/19410> stuck in the same way as
> well. Several retries are all failing with the same OOM.
>
>
> I've also dug through the Github Actions history and found a few others.
> So it doesn't seem to be just a one-off.
>
>
> Cheers,
> Niko
> ------------------------------
> *From:* Khalid Mammadov <[email protected]>
> *Sent:* Tuesday, November 9, 2021 6:24 AM
> *To:* [email protected]
> *Subject:* [EXTERNAL] OOM issue in the CI
>
>
> *CAUTION*: This email originated from outside of the organization. Do not
> click links or open attachments unless you can confirm the sender and know
> the content is safe.
>
> Hi Devs,
>
> I have been working on below PR for and run into OOM issue during testing
> on GitHub actions (you can see in commit history).
>
> https://github.com/apache/airflow/pull/19139/files
>
> The tests for databases Postgres, MySQL etc. fails due to OOM and docker
> gets killed.
>
> I have reduced parallelism to 1 "in the code" *temporarily* (the only
> extra change in the PR) and it passes all the checks which confirms the
> issue.
>
>
> I was hoping if you could advise the best course of action in this
> situation so I can force parallelism to 1 to get all checks passed or some
> other way to solve OOM?
>
> Any help would be appreciated.
>
>
> Thanks in advance
>
> Khalid
>

Reply via email to