How much of a problem is flakiness caused by minor pixel differences compared to overall flakiness? I looked at the top 10 flaky tests here <https://data.corp.google.com/sites/chrome_generic_flakiness_dashboard_datasite/top_flakes/?f=test_id:re:.*web_test.*> and none of them were minor pixel differences.
70 tests is a manageable number and it seems reasonable to add fuzzy matching to them. On Tuesday, August 2, 2022 at 9:04:00 AM UTC-7 Xianzhu Wang wrote: > On Mon, Aug 1, 2022 at 10:36 AM Vivian Zhi (支文文) <[email protected]> > wrote: > >> Thanks for valuable feedback! Stephen, Xianzhu, will see if we can add a >> filter in result.html to grab those tests in range. >> > > The CL <https://chromium-review.googlesource.com/c/chromium/src/+/3803707> > adding pixel diff filter in results.html has landed. Thanks Thorben! > > In this example results.html > <https://test-results.appspot.com/data/layout_results/linux-rel/1086271/blink_web_tests%20%28with%20patch%29/layout-test-results/results.html>, > > you can examine the pixel results of tests that produced pixel differences > matching a particular fuzzy rule in the following steps: > 1. Enter pixel difference filter e.g. "channel_max:1-1" in the filter > input box; > 2. Click "All" button (as we show regressions only by default). > You might want to switch to "side-by-side view" and click the image to > examine the pixel values. > > With "channel_max:1-1" we can see all tests that produced pixel > differences that can be tolerated with a fuzzy rule like <meta name=fuzzy > content="0-1;0-1000000">. There are 70 such tests in the example > results.html. All of them look benign to me. So perhaps a universal rule > (for non wpt tests) is proper? > > On the other hand, even if we have such a universal rule, we can only > recover 70 tests. Instead of applying the rule automatically, we can also > manually modify these tests to include a meta fuzzy rule. > > >> On Mon, Aug 1, 2022 at 8:40 AM Xianzhu Wang <[email protected]> >> wrote: >> >>> On Mon, Aug 1, 2022 at 4:25 AM Stephen Chenney <[email protected]> >>> wrote: >>> >>>> Thanks for investigating the potential for fuzzy matching. >>>> >>>> Rendering Core continues to oppose a single fuzzy match rule across all >>>> web_tests. We have some tests where single pixel differences matter >>>> (related to pixel snapping, for example) and a universal fuzzy match would >>>> fail to identify problems with those. This came up in practice recently >>>> when the GPU team enabled fuzzy matching without telling us, and expected >>>> failing tests started passing when they shouldn't. >>>> >>> >>> I think a key difference between the original fuzzy matching rule and >>> the rule proposed by Vivian is the ranges. With maxDifference=0-1, we >>> should be able to catch most visible single pixel differences. What I'm not >>> sure is whether a difference like rgb(1, 0, 0) vs rgb(0, 0, 0) (each >>> component in the range of 0-255) should be treated as a failure in some >>> cases. >>> >>> Maybe specific sub teams have directories they could apply default fuzzy >>>> matching to. My guess is that the same directories where it will work will >>>> be directories with few failing tests, limiting the impact of a >>>> per-directory approach. >>>> >>>> Is there a way to reproduce the sampling below with a side-by-side >>>> comparison of the images? I would find it helpful to look through some of >>>> the cases that would pass with <meta name="fuzzy" content="0-1;0-1000">, >>>> for example. >>>> >>> >>> A filter by actual maxDifference and totalPixels in results.html will be >>> helpful. I can add it when I get time. >>> >>> Stephen. >>>> >>>> On Fri, Jul 29, 2022 at 8:20 PM 'Vivian Zhi (支文文)' via blink-dev < >>>> [email protected]> wrote: >>>> >>>>> Hi blink-dev >>>>> >>>>> I would like to let you know that blink-engprod has added feature >>>>> support for non-WPT fuzzy tests. It now allows both non-WPT reftests and >>>>> pixel tests to use the same fuzzy matching meta-tags as WPT tests.It also >>>>> shows max color channel difference and total number of different pixels >>>>> image diff stats in results.html >>>>> <https://test-results.appspot.com/data/layout_results/linux-rel/1073794/blink_web_tests%20%28with%20patch%29/layout-test-results/results.html>. >>>>> >>>>> With these capabilities in place, we like to research further to see if >>>>> we >>>>> can set up some general fuzzy match rules, help blink dev identify flaky >>>>> tests that can be potentially resolved by adjusting fuzzy matching rules. >>>>> Currently there are quite some web tests that are flaky due to a slight >>>>> image mismatch, which should have been tolerated. If we setup a general >>>>> fuzzy matching rule , something like: >>>>> >>>>> <meta name="fuzzy" content="0-1;0-1000"> >>>>> >>>>> Instruct the image comparison web tests that if color channel and >>>>> pixel diff fall within the range of the rule, we can ignore the diff and >>>>> pass the test.This way we can reduce test flakiness while still >>>>> maintain test accuracy without missing a real bug. >>>>> >>>>> We want to ask you some quick survey questions to help us make design >>>>> decisions, whether it makes sense to set up an universal cross-the-board >>>>> fuzzy match tolerant rule for all blink web tests, or we should make the >>>>> rules more specific to individual test or test sets. >>>>> >>>>> 1. Is an universal fuzzy match tolerant rule acceptable for the web >>>>> tests in your area? >>>>> >>>>> a). If the answer is yes, what is the acceptable range of max >>>>> color channel and pixel diff for your tests? >>>>> b) If the answer is no, pls share your reasons. >>>>> >>>>> 2. Do you prefer fuzzy matching rule adjustment at a per-test or per >>>>> test set level based on the pixel difference numbers shown in >>>>> results.html? >>>>> >>>>> Here is some sample data help you make choice, we collected data >>>>> recently from blink_web_tests result on linux-test builder, the >>>>> distribution of color channel maxDifference and totalPixel diff for >>>>> failing/flaky blink_web_tests >>>>> ( Note: over 70% tests in color channel maxDifference 0-10 range have >>>>> maxDifference=1): >>>>> >>>>> Color Channel maxDifferenece >>>>> Range Fail test count >>>>> 0-10 98 >>>>> 11-100 31 >>>>> 101-200 28 >>>>> 201-260 111 >>>>> totalPixels >>>>> Diff Range >>>>> Fail test count >>>>> 0-100 30 >>>>> 100-1000 57 >>>>> 1000-10,000 99 >>>>> 10,000-100,000 66 >>>>> 100,000-1,000,000 16 >>>>> >>>>> Let me know if you have any questions, looking forward to hearing from >>>>> you! >>>>> >>>>> >>>>> Vivian >>>>> on behalf of Chrome-Blink-EngProd >>>>> >>>>> -- >>>>> You received this message because you are subscribed to the Google >>>>> Groups "blink-dev" group. >>>>> To unsubscribe from this group and stop receiving emails from it, send >>>>> an email to [email protected]. >>>>> To view this discussion on the web visit >>>>> https://groups.google.com/a/chromium.org/d/msgid/blink-dev/CAPCqkTs-L5u22-Xp5U_LeBdLP%3D%2BTDH1KGv8MTmtKQFRcANCZJg%40mail.gmail.com >>>>> >>>>> <https://groups.google.com/a/chromium.org/d/msgid/blink-dev/CAPCqkTs-L5u22-Xp5U_LeBdLP%3D%2BTDH1KGv8MTmtKQFRcANCZJg%40mail.gmail.com?utm_medium=email&utm_source=footer> >>>>> . >>>>> >>>> -- >>>> You received this message because you are subscribed to the Google >>>> Groups "blink-dev" group. >>>> To unsubscribe from this group and stop receiving emails from it, send >>>> an email to [email protected]. >>>> To view this discussion on the web visit >>>> https://groups.google.com/a/chromium.org/d/msgid/blink-dev/CAGsbWzRDrX%3Dgz9NNcwpBEOXCxR37p2XwZC3Agm6fdE6%2BFcPhvg%40mail.gmail.com >>>> >>>> <https://groups.google.com/a/chromium.org/d/msgid/blink-dev/CAGsbWzRDrX%3Dgz9NNcwpBEOXCxR37p2XwZC3Agm6fdE6%2BFcPhvg%40mail.gmail.com?utm_medium=email&utm_source=footer> >>>> . >>>> >>> -- You received this message because you are subscribed to the Google Groups "blink-dev" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/blink-dev/f9d4f28e-385c-427b-b070-16e8ef1e843an%40chromium.org.
