Cool. I'd love to have the caplog issue fixed without removing caplog
convenience. Let's see if we can get the missing 20% confidence back to
make sure no more time of contributors and maintainers is lost because of
this one

On Sat, Feb 15, 2025 at 10:35 PM Ash Berlin-Taylor <a...@apache.org> wrote:

>
>
> @pytest.fixture(name="caplog")
> def override_caplog(request):
>     fixture = pytest.LogCaptureFixture(request.node)
>     yield fixture
>     fixture._finalize()
>
>     if "airflow.logging_config" in sys.modules:
>         import airflow.logging_config
>
>         airflow.logging_config.configure_logging()
>
>
> I am 80% confident that this is going to fix the flakeyness from caplog.
> I’m introducing it in my Trigger re-work PR.
>
> > On 7 Feb 2025, at 13:17, Jarek Potiuk <ja...@potiuk.com> wrote:
> >
> > As clearly explained in the above example - this is not a problem with
> > databricsk and azure. This is aproblem in Azure BECAUSE we move
> Databricks.
> > So clearly a side effect of one group of tests on another. As such - we
> > have no idea how many other cases we have like that and who might be
> > causing it. We only know that a lot of time of people who try to keep the
> > main green is lost because of that.
> >
> >> This is not very collaborative and accepting that people have a life
> > outside of their computer’s Jarek.
> >
> > No. It's collaborative. This is how our social contract works. If you
> want
> > to change the rules and have good reasoning for that - propose a change.
> > And you still can propose to revert the decision in a separate thread.
> This
> > happened in the past and it's perfectly fine.
> >
> > We cannot completely change the social agreement we have on decision
> making
> > work in our community because one person missed a thread on a really
> > insignificant thing. We should just accept it and follow, or propose to
> > change a contract. That's how a community works.
> >
> >
> > J
> >
> >
> >
> > On Fri, Feb 7, 2025 at 2:05 PM Jarek Potiuk <ja...@potiuk.com> wrote:
> >
> >>> How is it flakey though? This is the part I’m asking for explaining
> >> about. Surely it’s messed up and _ALWAYS_ messed up?
> >>
> >> See above example posted in the thread. It explains everything. I am not
> >> sure if you missed it, but it's the third time I am explaining it and
> you
> >> keep on asking. The example I have shows not only the copy & paste of
> the
> >> problem, but link to the exact failure.
> >>
> >> On Fri, Feb 7, 2025 at 2:00 PM Ash Berlin-Taylor <a...@apache.org>
> wrote:
> >>
> >>> How does caplog introduce flakiness though? That still isn’t
> >>>
> >>>
> >>>> On 7 Feb 2025, at 11:07, Jarek Potiuk <ja...@potiuk.com> wrote:
> >>>>
> >>>> Ash - take a close look at the example I sent. Maybe you missed it.
> >>>>
> >>>> * Caplog gets logs from loggers - If your loggers are messed 1000
> tests
> >>>> before and not restored - you will have failing tests. Side effects
> >>>
> >>> How is it flakey though? This is the part I’m asking for explaining
> >>> about. Surely it’s messed up and _ALWAYS_ messed up?
> >>>
> >>>>
> >>>> * Mocking logs is mocking lag method call. It always work and is
> >>>> side-effect resistant
> >>>>
> >>>>
> >>>> I would only agree to restore back Caplog usag (which as I mentioned
> >>>> requires new proposal, new consensus or voting as the consensus is
> >>> already
> >>>> reached). I will expect the person proposing it to track all the cases
> >>>> where side effects have been causing problems
> >>>
> >>> This is not very collaborative and accepting that people have a life
> >>> outside of their computer’s Jarek.
> >>>
> >>>
> >>> Here’s a proposal for you: don’t create needless code churn and
> wholesale
> >>> replace caplog, instead only swap if it’s not actually a problem.
> Replace
> >>> it only when it is noticed as being flakey - which you’ve pointed out
> two
> >>> cases, Databricks and Azure.. We have 287 instances of `caplog` in our
> >>> tests. How many are actually flakey? Why change things that aren’t
> broken?
> >>>
> >>>>
> >>>>
> >>>> This happened (and let me repeat it again - I hope it will not be
> again
> >>>> ignored this time). About 10 to 15 times in the last few weeks.
> >>>>
> >>>> Ash - if you want to propose restoring Caplog usage, please start a
> new
> >>>> thread. Ideally with explanation why.
> >>>>
> >>>> J.
> >>>>
> >>>> pt., 7 lut 2025, 10:15 użytkownik Ash Berlin-Taylor <a...@apache.org>
> >>>> napisał:
> >>>>
> >>>>> How does replacing caplog with mocking help anything, assuming that
> you
> >>>>> test what is logged against the mock?
> >>>>>
> >>>>> Please explain it to me like I’m 5, cos I don’t see how it makes the
> >>>>> blindest bit of difference. Testing what is on the caplog output vs
> >>> what is
> >>>>> recorded on the mock log object is functional identical. If caplog
> >>> tests
> >>>>> are flakey then mock log tests will be equally flakey. The caplog
> >>> process
> >>>>> itself is not flakey. It’s what is producing the log that is flakey,
> >>> no?
> >>>>>
> >>>>> If you are talking about removing logging tests entirely if they
> aren’t
> >>>>> providing value: sure, but that isn’t what I understood this to be
> >>> about.
> >>>>>
> >>>>>> On 6 Feb 2025, at 19:41, Jarek Potiuk <ja...@potiuk.com> wrote:
> >>>>>>
> >>>>>> BTW. Ash -  good example of why we want to get rid of caplog is
> here:
> >>>>>>
> >>>>>>
> >>>>>
> >>>
> https://github.com/apache/airflow/actions/runs/13185566641/job/36807274106
> >>>>>>
> >>>>>> This is PR to move databricks provider to the new provider
> structure.
> >>>>>> Suddenly in that PR  ***WASB*** (yes - we moved **DATABRICKS**)
> tests
> >>> are
> >>>>>> failing:
> >>>>>>
> >>>>>>
> >>>>>> =================================== FAILURES
> >>>>>> ===================================
> >>>>>> 2559
> >>>>>> <
> >>>>>
> >>>
> https://github.com/apache/airflow/actions/runs/13185566641/job/36807274106#step:6:2576
> >>>>>> _______
> >>>>>> TestWasbHook.test_delete_container_generic_exception[ValueError]
> >>> _______
> >>>>>> 2560
> >>>>>> <
> >>>>>
> >>>
> https://github.com/apache/airflow/actions/runs/13185566641/job/36807274106#step:6:2577
> >>>>>>
> >>>>>> providers/tests/microsoft/azure/hooks/test_wasb.py:539: in
> >>>>>> test_delete_container_generic_exception
> >>>>>> 2561
> >>>>>> <
> >>>>>
> >>>
> https://github.com/apache/airflow/actions/runs/13185566641/job/36807274106#step:6:2578
> >>>>>>
> >>>>>> assert "Error deleting container: mycontainer" in caplog.text
> >>>>>> 2562
> >>>>>> <
> >>>>>
> >>>
> https://github.com/apache/airflow/actions/runs/13185566641/job/36807274106#step:6:2579
> >>>>>> E
> >>>>>> AssertionError: assert 'Error deleting container: mycontainer' in ''
> >>>>>> 2563
> >>>>>> <
> >>>>>
> >>>
> https://github.com/apache/airflow/actions/runs/13185566641/job/36807274106#step:6:2580
> >>>>>> E
> >>>>>> + where '' = <_pytest.logging.LogCaptureFixture object at
> >>>>>> 0x7fa7109dcbe0>.text
> >>>>>> 2564
> >>>>>> <
> >>>>>
> >>>
> https://github.com/apache/airflow/actions/runs/13185566641/job/36807274106#step:6:2581
> >>>>>> ______
> >>>>>> TestWasbHook.test_delete_container_generic_exception[RuntimeError]
> >>> ______
> >>>>>> 2565
> >>>>>> <
> >>>>>
> >>>
> https://github.com/apache/airflow/actions/runs/13185566641/job/36807274106#step:6:2582
> >>>>>>
> >>>>>> providers/tests/microsoft/azure/hooks/test_wasb.py:539: in
> >>>>>> test_delete_container_generic_exception
> >>>>>> 2566
> >>>>>> <
> >>>>>
> >>>
> https://github.com/apache/airflow/actions/runs/13185566641/job/36807274106#step:6:2583
> >>>>>>
> >>>>>> assert "Error deleting container: mycontainer" in caplog.text
> >>>>>> 2567
> >>>>>> <
> >>>>>
> >>>
> https://github.com/apache/airflow/actions/runs/13185566641/job/36807274106#step:6:2584
> >>>>>> E
> >>>>>> AssertionError: assert 'Error deleting container: mycontainer' in ''
> >>>>>> 2568
> >>>>>> <
> >>>>>
> >>>
> https://github.com/apache/airflow/actions/runs/13185566641/job/36807274106#step:6:2585
> >>>>>> E
> >>>>>> + where '' = <_pytest.logging.LogCaptureFixture object at
> >>>>>> 0x7fa710a65040>.text
> >>>>>> 2569
> >>>>>> <
> >>>>>
> >>>
> https://github.com/apache/airflow/actions/runs/13185566641/job/36807274106#step:6:2586
> >>>>>> ____________
> >>>>>> TestWasbHook.test_delete_container_resource_not_found _____________
> >>>>>> 2570
> >>>>>> <
> >>>>>
> >>>
> https://github.com/apache/airflow/actions/runs/13185566641/job/36807274106#step:6:2587
> >>>>>>
> >>>>>> providers/tests/microsoft/azure/hooks/test_wasb.py:548: in
> >>>>>> test_delete_container_resource_not_found
> >>>>>> 2571
> >>>>>> <
> >>>>>
> >>>
> https://github.com/apache/airflow/actions/runs/13185566641/job/36807274106#step:6:2588
> >>>>>>
> >>>>>> assert "Unable to delete container mycontainer (not found)" in
> >>>>> caplog.text
> >>>>>> 2572
> >>>>>> <
> >>>>>
> >>>
> https://github.com/apache/airflow/actions/runs/13185566641/job/36807274106#step:6:2589
> >>>>>> E
> >>>>>> AssertionError: assert 'Unable to delete container mycontainer (not
> >>>>> found)'
> >>>>>> in ''
> >>>>>> 2573
> >>>>>> <
> >>>>>
> >>>
> https://github.com/apache/airflow/actions/runs/13185566641/job/36807274106#step:6:2590
> >>>>>> E
> >>>>>> + where '' = <_pytest.logging.LogCaptureFixture object at
> >>>>>> 0x7fa710c37bb0>.text
> >>>>>>
> >>>>>>
> >>>>>> And those are the failing tests. They are using caplog and I am not
> >>> sure
> >>>>> if
> >>>>>> you can figure out why suddenly the expected log entries cannot be
> >>> found
> >>>>> in
> >>>>>> caplog.
> >>>>>>
> >>>>>> My best guess is that somewhere, some tests (databricks) are
> >>>>> reconfiguring
> >>>>>> the loggers, and it messes up with caplog. Classic side effects.
> >>>>>>
> >>>>>> Mocking log methods directly avoids side-effects involved with
> logger
> >>>>>> reconfiguraton.  This is about 10th time we have this problem while
> >>>>> moving
> >>>>>> providers.
> >>>>>> Here are the tests that are failing.
> >>>>>>
> >>>>>>  @pytest.mark.parametrize("exc", [ValueError, RuntimeError])
> >>>>>>  def test_delete_container_generic_exception(self, exc:
> >>>>> type[Exception],
> >>>>>> caplog):
> >>>>>>      hook = WasbHook(wasb_conn_id=self.azure_shared_key_test)
> >>>>>>      with mock.patch.object(WasbHook, "_get_container_client") as m:
> >>>>>>          m.return_value.delete_container.side_effect =
> >>>>>> exc("FakeException")
> >>>>>>          caplog.clear()
> >>>>>>          caplog.set_level("ERROR")
> >>>>>>          with pytest.raises(exc, match="FakeException"):
> >>>>>>              hook.delete_container("mycontainer")
> >>>>>>          assert "Error deleting container: mycontainer" in
> >>> caplog.text
> >>>>>>
> >>>>>>  def test_delete_container_resource_not_found(self, caplog):
> >>>>>>      hook = WasbHook(wasb_conn_id=self.azure_shared_key_test)
> >>>>>>      with mock.patch.object(WasbHook, "_get_container_client") as m:
> >>>>>>          m.return_value.delete_container.side_effect =
> >>>>>> ResourceNotFoundError("FakeException")
> >>>>>>          caplog.clear()
> >>>>>>          caplog.set_level("WARNING")
> >>>>>>          hook.delete_container("mycontainer")
> >>>>>>          assert "Unable to delete container mycontainer (not found)"
> >>> in
> >>>>>> caplog.text
> >>>>>>
> >>>>>>  @mock.patch.object(WasbHook, "delete_blobs")
> >>>>>>  def test_delete_single_blob(self, delete_blobs,
> >>>>>> mocked_blob_service_client):
> >>>>>>      hook = WasbHook(wasb_conn_id=self.azure_shared_key_test)
> >>>>>>      hook.delete_file("container", "blob", is_prefix=False)
> >>>>>>      delete_blobs.assert_called_once_with("container", "blob")
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> On Thu, Feb 6, 2025 at 3:56 PM Jarek Potiuk <ja...@potiuk.com>
> wrote:
> >>>>>>
> >>>>>>> Just - note that the 72 hrs are specifically designed to cover the
> >>>>>>> weekend. So when there are such - rather insignificant - issues, I
> >>>>> think we
> >>>>>>> should be ok. We have had many of our votes - including the
> >>> absolutely
> >>>>> most
> >>>>>>> important - releases - lasting 72 hours over weekend and it has
> never
> >>>>> been
> >>>>>>> a problem, we only extended it including weekends where the matters
> >>> were
> >>>>>>> really important and we expected them to be contentious (this one
> was
> >>>>> not).
> >>>>>>>
> >>>>>>> Does it mean Ash that we propose extending all kinds of consensus
> and
> >>>>>>> voting in this way ? It's the first time it has been raised, so
> maybe
> >>>>>>> that's a good idea to make it longer now when we are all
> additionally
> >>>>> busy
> >>>>>>> with Airflow 3, but I don't see any reason why it should be a
> problem
> >>>>> "in
> >>>>>>> general". WDYT?
> >>>>>>>
> >>>>>>> Note that your objection is also 5 days after the initial message,
> so
> >>>>> even
> >>>>>>> if it would be 2 days more, that would not change anything.
> >>>>>>>
> >>>>>>> IMHO the consensus is reached. and If you want to revert that you
> >>> should
> >>>>>>> start a new thread Ash.
> >>>>>>>
> >>>>>>> J.
> >>>>>>>
> >>>>>>> On Thu, Feb 6, 2025 at 3:35 PM Ash Berlin-Taylor <a...@apache.org>
> >>>>> wrote:
> >>>>>>>
> >>>>>>>> Please be aware of time when you are proposing this.  48 out of
> the
> >>> 72
> >>>>>>>> hours your had here were over the weekend which doesn’t give many
> >>>>> people
> >>>>>>>> adequate time to read and think upon this.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> Because I’m -1 on this as I don’t have enough info (see my other
> >>> email)
> >>>>>>>> to understand why this change makes the first bit of difference.
> >>>>>>>>
> >>>>>>>>> On 4 Feb 2025, at 22:38, Buğra Öztürk <ozturkbugr...@gmail.com>
> >>>>> wrote:
> >>>>>>>>>
> >>>>>>>>> Hey all,
> >>>>>>>>>
> >>>>>>>>> The consensus has been reached.
> >>>>>>>>>
> >>>>>>>>> Regards,
> >>>>>>>>>
> >>>>>>>>> On Sat, Feb 1, 2025 at 5:38 PM Buğra Öztürk <
> >>> ozturkbugr...@gmail.com>
> >>>>>>>> wrote:
> >>>>>>>>>
> >>>>>>>>>> Hello everyone,
> >>>>>>>>>>
> >>>>>>>>>> There have been recurring discussions about minimizing the usage
> >>> of
> >>>>>>>> caplog
> >>>>>>>>>> in unit tests. If logs are necessary for a test, we should mock
> >>> them
> >>>>>>>> unless
> >>>>>>>>>> the test strictly requires actual log output. Otherwise, the
> >>> caplog
> >>>>>>>> tests
> >>>>>>>>>> should be removed and logs should be mocked.
> >>>>>>>>>>
> >>>>>>>>>> During recent large migrations, changes and ongoing CI/CD
> efforts,
> >>>>>>>> caplog
> >>>>>>>>>> has proven to be flaky in unit tests. They frequently cause red
> >>>>>>>> pipelines
> >>>>>>>>>> in PRs and scheduled CI/CD runs. There have been local
> >>> discussions on
> >>>>>>>>>> removing caplog from unit tests. To formalize this, I propose a
> >>> lazy
> >>>>>>>>>> consensus to remove and prevent caplog usage in unit tests,
> >>> ensuring
> >>>>>>>> logs
> >>>>>>>>>> are mocked when needed and disallowing caplog without mocking
> >>> unless
> >>>>>>>>>> explicitly approved.
> >>>>>>>>>>
> >>>>>>>>>> *Why should caplog be avoided?*
> >>>>>>>>>>
> >>>>>>>>>> - Big maintenance effort on CI/CD
> >>>>>>>>>> - Instability
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> *What should we do instead if we use caplog in a unit test?*
> >>>>>>>>>>
> >>>>>>>>>> - Mocking the Log
> >>>>>>>>>>
> >>>>>>>>>> *In the Scope of this consensus*
> >>>>>>>>>>
> >>>>>>>>>> - Remove caplog usage from unit tests if possible
> >>>>>>>>>> - Mock logs and remove caplog from unit tests if possible
> >>>>>>>>>> - Exceptional cases will be subject to maintainer approval
> >>>>>>>>>> - Prevent caplog to be included in unit tests without explicitly
> >>>>>>>>>> mocking the log
> >>>>>>>>>>
> >>>>>>>>>> *Action Items related to the above Scope:*
> >>>>>>>>>>
> >>>>>>>>>> - Scan and replace caplog tests with mocking where possible
> >>>>>>>>>> - Scan and remove caplog tests where possible
> >>>>>>>>>> - Include CI check to prevent adding additional caplog tests
> >>> without
> >>>>>>>>>> Mocking and/or without approval from a maintainer to allow
> >>>>>>>> flexibility in
> >>>>>>>>>> some exceptional cases
> >>>>>>>>>> - Create documentation with examples for implementing tests
> using
> >>>>>>>> logs
> >>>>>>>>>> without caplog (e.g., using mocking or avoiding logs entirely)
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> The lazy consensus will be reached on 2025-02-04,  (unless
> someone
> >>>>>>>> surfaces
> >>>>>>>>>> an objection).
> >>>>>>>>>> --
> >>>>>>>>>> Bugra Ozturk
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> --
> >>>>>>>>> Bugra Ozturk
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>> ---------------------------------------------------------------------
> >>>>>>>> To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
> >>>>>>>> For additional commands, e-mail: dev-h...@airflow.apache.org
> >>>>>>>>
> >>>>>>>>
> >>>>>
> >>>>>
> >>>
> >>>
> >>> ---------------------------------------------------------------------
> >>> To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
> >>> For additional commands, e-mail: dev-h...@airflow.apache.org
> >>>
> >>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
> For additional commands, e-mail: dev-h...@airflow.apache.org
>
>

Reply via email to