Just to let you know.

Issue looks like is still there:

https://github.com/apache/airflow/runs/4167464563?check_suite_focus=true



On 10/11/2021 13:40, Jarek Potiuk wrote:
Merged!  Please rebase (Khalid- you can remove the workaround of yours) and let me know.

There is one failure that happened in my tests:

https://github.com/apache/airflow/runs/4165358689?check_suite_focus=true <https://github.com/apache/airflow/runs/4165358689?check_suite_focus=true> - but we can observe results of this one and try to find the reason separately if it continues to repeat.

J.

On Wed, Nov 10, 2021 at 12:49 PM Jarek Potiuk <[email protected] <mailto:[email protected]>> wrote:

    Fix being tested in: https://github.com/apache/airflow/pull/19512
    <https://github.com/apache/airflow/pull/19512> (committer PR) and
    https://github.com/apache/airflow/pull/19514
    <https://github.com/apache/airflow/pull/19514> (regular user PR).


    On Wed, Nov 10, 2021 at 11:25 AM Jarek Potiuk <[email protected]
    <mailto:[email protected]>> wrote:

        OK. I took a look . It looks like indeed "core" tests" are
        briefly (and sometimes for  a longer time) pass over 50% of
        memory available on Github Runners. I do not think optimizing
        them now makes little sense - because even if we optimize them
        now, they will likely soon again reach 50-60% of available
        memory, which - when ther are other parallel tests running
        might easily get OOM.

        It looks like those are only "Core" type of tests so the
        solution will be (similarly as with "Integration" tests) to
        separate them out to a non-parallel run for github runners.

        On Tue, Nov 9, 2021 at 9:33 PM Jarek Potiuk <[email protected]
        <mailto:[email protected]>> wrote:

            Yep. Apparently one of the recent tests is using too much
            memory. I had some private errands that made me less
            available last few days - but I will have time to catch-up
            tonight/tomorrow.

            Thanks for changing the "parallel" level in your PR - that
            will give me more datapoints. I've just re-run both PRs
            with "debug-ci-resources" label. This is our "debug"
            label to show resource use during the build and i might be
            able to find and fix the root cause.

            For the future - in case any other committer wants to
            investigate it, setting the "debug-ci-resources" labels
            turns on the debugging mode showing this information
            periodically alongside the progress of tests - it can be
            helpful in determining what caused the OOM:

            CONTAINER ID   NAME                                      
             CPU % MEM USAGE / LIMIT     MEM %     NET I/O   BLOCK I/O
                    PIDS
            c46832148ff7 airflow-always-mssql_airflow_run_e59b6039c3d8
            99.59%    365.1MiB / 6.789GiB   5.25%     1.62MB / 3.33MB
              8.97MB / 20.5kB   8
            f4d2a192d6fc   airflow-always-mssql_mssqlsetup_1          
                0.00%     0B / 0B 0.00%     0B / 0B           0B / 0B
                      0
            a668cdedc717 airflow-api-mssql_airflow_run_bcc466077ac0
             35.07%    431.4MiB / 6.789GiB   6.21% 2.26MB / 4.47MB  
            73.2MB / 20.5kB   8
            f306f4221ba1   airflow-api-mssql_mssqlsetup_1            
               0.00%     0B / 0B 0.00%     0B / 0B           0B / 0B  
                    0
            7f10748e9496   airflow-api-mssql_mssql_1              
            30.66%    735.5MiB / 6.789GiB 10.58%    4.47MB / 2.26MB  
            36.8MB / 124MB  132
            8b5ca767ed0c   airflow-always-mssql_mssql_1              
             12.59%    716.5MiB / 6.789GiB 10.31%    3.33MB / 1.63MB  
            36.7MB / 52.7MB 131

                          total        used        free  shared
             buff/cache   available
            Mem:           6951        2939         200       6      
             3811        3702
            Swap:             0           0           0

            Filesystem      Size  Used Avail Use% Mounted on
            /dev/root        84G   51G   33G  61% /
            /dev/sda15      105M  5.2M  100M   5% /boot/efi
            /dev/sdb1        14G  4.1G  9.0G  32% /mnt

            J.


            On Tue, Nov 9, 2021 at 9:19 PM Oliveira, Niko
            <[email protected]> wrote:

                Hey all,


                Just to throw another data point in the ring, I've had
                a PR <https://github.com/apache/airflow/pull/19410>
                stuck in the same way as well. Several retries are all
                failing with the same OOM.


                I've also dug through the Github Actions history and
                found a few others. So it doesn't seem to be just a
                one-off.


                Cheers,
                Niko

                
------------------------------------------------------------------------
                *From:* Khalid Mammadov <[email protected]
                <mailto:[email protected]>>
                *Sent:* Tuesday, November 9, 2021 6:24 AM
                *To:* [email protected]
                <mailto:[email protected]>
                *Subject:* [EXTERNAL] OOM issue in the CI

                *CAUTION*: This email originated from outside of the
                organization. Do not click links or open attachments
                unless you can confirm the sender and know the content
                is safe.


                Hi Devs,

                I have been working on below PR for and run into OOM
                issue during testing on GitHub actions (you can see in
                commit history).

                https://github.com/apache/airflow/pull/19139/files
                <https://github.com/apache/airflow/pull/19139/files>

                The tests for databases Postgres, MySQL etc. fails due
                to OOM and docker gets killed.

                I have reduced parallelism to 1 "in the code"
                *temporarily* (the only extra change in the PR) and it
                passes all the checks which confirms the issue.


                I was hoping if you could advise the best course of
                action in this situation so I can force parallelism to
                1 to get all checks passed or some other way to solve OOM?

                Any help would be appreciated.


                Thanks in advance

                Khalid

Reply via email to