I will say personally, as an author, I am generally accustomed to using special characters in my regex character classes freely, and hearing that the HTML spec is making a backwards incompatible breaking change that would prevent me from doing so is kind of surprising, and personally i would probably find it a little bit annoying. Especially since it puts the HTML spec out of sync with every other major regex implementation that I know if, meaning that new authors would have a hard time checking their patterns for validity on any of the common regex visualization/testing sites.
Additionally, what we're considering as "not real breakage" here is still a decreased user experience for users, right? As well as extra work for developers who's pages previously had perfectly acceptable UX (warned users about invalid values prior to submission) and now have pages where that feature no longer functions. Going off of the discussion on backwards compatibility in the original TC39 proposal, https://github.com/tc39/proposal-regexp-v-flag#is-the-new-syntax-backwards-compatible-do-we-need-another-regular-expression-flag it feels like a lot of the considerations in that discussion boiled down to "flags are an accessible and convenient way to opt in". But in this case the opposite is true—flags are completely unexposed by the HTML spec, so there's no easy and accessible way to opt in to the new behavior. It feels like some of the other options they considered (for example, a way of specifying flags "in pattern" like some engines allow for, or a prefix like \U[ to allow for set operations) would make much more sense when considered in the HTML usage context where flags aren't readily available to authors. I guess if I had to summarize my concerns, it's that by making this change for all users with no opt in, we're adopting the letter of the TC39 proposal but not the spirit, which leaves the HTML spec out of step with user expectations. However, I do understand the actual potential for breakage here is pretty small and it's the price we pay for a more featureful and expressive regex language. Has the HTML spec considered any small changes they could do to improve backwards compatibility and usability in this area, like falling back to parsing regexes as /u if /v parsing fails? True, that would encode legacy behavior in the platform where it's arguably not necessary to, but I feel like the improvement in developer ergonomics is worth the tradeoff. On Wednesday, April 19, 2023 at 3:28:13 PM UTC-5 rby...@chromium.org wrote: On Wed, Apr 19, 2023 at 3:41 PM Philip Jägenstedt <foo...@chromium.org> wrote: I wonder if we can get enough confidence with less work than investigating 40 randomly chosen sites from UseCounter hits. This is a population proportion problem, and https://sample-size.net/confidence-interval-proportion/ is a useful tool. If you check 40 cases and find no breakage (N=40, x=0) that gives us 95% confidence that breakage is less than 7.2% of samples in this data set. If it's useful to check that much depends on the value of the use counter. Is https://chromestatus.com/metrics/feature/timeline/popularity/4463 the right use counter, and has it reached stable yet? Why is marked as obsolete? For purposes of illustration, let's use 0.04% from earlier in the thread and say we want to be (95%) confident that real breakage is less than 0.01%. Then we just need to get below 25% in the linked tool, and checking 11 samples and finding nothing is enough to do this. I like your more sophisticated math, but it's <0.001% that we want I'm afraid, not 0.01%. So, if I'm following your instructions right, that's ~42 samples to have 95% confidence :-) Of course 0.001% is just a rough guideline that has often proven to be too high or too low. So this is all a judgement call anyway. On Wed, Apr 19, 2023 at 5:43 PM Mathias Bynens <mt...@google.com> wrote: Thanks for the guidance, Rick. I’ve prepared a CL moving the flag to status=experimental <https://chromium-review.googlesource.com/c/chromium/src/+/4447958> and I can commit to investigating 40 unique UseCounter hits and summarizing my findings. Fingers crossed the trend of “no actual breakage detected” continues. I’ll keep you posted. On Wed, Apr 19, 2023 at 5:26 PM Rick Byers <rby...@chromium.org> wrote: Thanks for doing a thorough compat analysis of this Mathias. I can totally see this being one where all the examples we can find don't seem to cause breakage in practice. I know it's a lot, but if we looked at 40 random examples and found none of them to break, that would suggest an upper bound of <0.001% of pages impacted (probably much lower) and I'd be OK giving this a shot with a finch killswitch ready in case of reports of serious breakage. Does that sound reasonable to you? Also feel free to set your flag <https://source.chromium.org/chromium/chromium/src/+/main:third_party/blink/renderer/platform/runtime_enabled_features.json5;l=1912?q=HTMLPatternRegExpUnicodeSets%20file:.json5&ss=chromium> to status=experimental, that'll get us some additional usage coverage (from the small population that runs with --enable-experimental-web-platform-features) and also signal that this is close to becoming shipping behavior. Rick On Mon, Apr 17, 2023 at 7:03 AM 'Mathias Bynens' via blink-dev < blin...@chromium.org> wrote: So far, none of the UseCounter hits I investigated constitute any actual breakage. The vast majority of hits seem to be login forms backed by server-side validation. I’ll keep looking though. In the meantime, this feature is now <https://chromium-review.googlesource.com/c/chromium/src/+/4414859> available behind the `--enable-blink-features=HTMLPatternRegExpUnicodeSets` flag (disabled by default). On Wednesday, April 5, 2023 at 5:53:10 PM UTC+2 Mathias Bynens wrote: On Wed, Apr 5, 2023 at 5:23 PM Alex Russell <sligh...@chromium.org> wrote: I don't understand why TAG review is not applicable for this intent. Fair enough. I’ve filed a TAG review request here: https://github.com/w3ctag/design-reviews/issues/832 I’ll update the ChromeStatus entry to refer to it. On Tuesday, April 4, 2023 at 5:21:16 AM UTC-7 mt...@google.com wrote: Thanks to the UseCounter + UKM + M112 hitting Stable, more results are starting to come in. I’ll be collecting public examples of potential incompatibilities here: https://bugs.chromium.org/p/chromium/issues/detail?id=1412729#c11 So far 0 out of the 2 examples cause any actual breakage — fingers crossed that trend continues. On Mon, Apr 3, 2023 at 10:26 AM Philip Jägenstedt <foo...@chromium.org> wrote: I took a look at https://github.com/whatwg/html/pull/7908 and it looks like there's agreement to merge it, but it's waiting on this intent to be approved. Normally we block in the other direction, but that's fine, as long as the spec change is merged. Looks like there's broad support for this change, and it's just a question of the site compat risk. ~0.04% as an upper bound is quite high. Can we wait until the use counter is in stable and look at a random set of sites hitting the use counter to determine what the real-world breakage looks like? On Fri, Mar 31, 2023 at 5:07 PM 'Mathias Bynens' via blink-dev < blin...@chromium.org> wrote: On Fri, Mar 31, 2023 at 4:35 PM Mike Taylor <mike...@chromium.org> wrote: Hey Mathias, On 3/31/23 5:56 AM, Mathias Bynens wrote: Contact emails mat...@chromium.org Specification https://github.com/whatwg/html/pull/7908 Summary The <input pattern> attribute allows developers to specify a regular expression pattern against which the input’s values are checked for validity. <label> Part number: <input pattern="[0-9][A-Z]{3}" name="part" title="A part number is a digit followed by three uppercase letters."> </label> When the pattern attribute was first implemented, these regular expressions were compiled without any RegExp flags. In 2014, the HTML Standard changed this by implicitly enabling the u flag for the pattern attribute, enabling better Unicode support (including support for Unicode character properties like \p{Letter}). This change shipped in Chrome 53. <https://chromestatus.com/feature/4753420745441280> Now, we’re taking this to the next level by enabling the new RegExp v flag <https://v8.dev/features/regexp-v-flag> instead of u, enabling the use of set notation, string literal syntax, and Unicode properties of strings. (Context: The RegExp v flag is a JavaScript language feature which previously went through the Blink Intents process and shipped in Chrome 112 <https://chromestatus.com/feature/5144156542861312>. This new ChromeStatus entry is specifically about integrating it with the HTML pattern attribute.) Blink component Blink>Forms <https://bugs.chromium.org/p/chromium/issues/list?q=component:Blink%3EForms> Search tags unicode <https://chromestatus.com/features#tags:unicode>, regexp <https://chromestatus.com/features#tags:regexp>, pattern <https://chromestatus.com/features#tags:pattern>, validation <https://chromestatus.com/features#tags:validation> TAG review TAG review status Not applicable Risks Interoperability and Compatibility The spec patch at https://github.com/whatwg/html/pull/7908 lists the potentially breaking changes. Some patterns that previously would compile, now throw an early error with the v flag — specifically those with a character class including either an unescaped special character or a double punctuator. We expect such patterns to be rare. To validate this assumption we’ve added a UseCounter called HTMLPatternRegExpUnicodeSetIncompatibilitiesWithUnicodeMode <https://chromestatus.com/metrics/feature/popularity#HTMLPatternRegExpUnicodeSetIncompatibilitiesWithUnicodeMode> in M112, which tracks patterns in any JavaScript u RegExps generated via the HTML pattern attribute that would throw if they were used with the v flag. Importantly, note that any throwing pattern gracefully degrades — it simply behaves as if the pattern attribute wasn’t present, resulting in inputElement.validity.valid === true for any input value. Consequently, the only compatibility risk is that some value/pattern combinations that would previously result in inputElement.validity.valid being false now result in it being true. Thus, for every UseCounter hit, it could still be that there is no actual breakage — the UseCounter just gives us the upper bound. The currently available data from Beta suggests the UseCounter hits for 0.0393% of Chrome page loads. I'm somewhat curious to see how much that UseCounter will grow (if at all) when 112 goes to stable next week... Me too, and FWIW I'd understand if you and the other API owners prefer to wait until there’s some data for Stable before responding to this Intent. Do you have any concerns about certain inputs being sent to a server that might not have any backend validation, that would previously be prevented by the u-vintage validation? That’s indeed the only scenario in which there would be breakage. So far we haven’t heard of such cases in the wild. (Arguably, such web pages are already broken, since DevTools could easily be used to remove the `pattern` attribute, or requests could be made with tools like `curl`.) FWIW, there was a similar discussion in this old blink-dev thread: https://groups.google.com/a/chromium.org/g/blink-dev/c/XUNMtri0tI4/m/mjPkwXKNAQAJ I forgot to mention that we explicitly added a console warning in M112 for any `pattern` attribute values that would be affected by this change, to help developers prepare for the potential change. One developer reported seeing the warning and adjusting their `pattern` attribute values accordingly, but it’s unclear whether inaction would have really broken their web page: https://bugs.chromium.org/p/chromium/issues/detail?id=1412729#c7 Gecko: Positive (Mozilla standards position request <https://github.com/mozilla/standards-positions/issues/745>, implementation tracking issue <https://bugzilla.mozilla.org/show_bug.cgi?id=pattern-v>) WebKit: Positive (WebKit standards position request <https://github.com/WebKit/standards-positions/issues/132>, implementation tracking issue <https://bugs.webkit.org/show_bug.cgi?id=pattern-v>) Web developers: No signals Other signals: Debuggability The pattern attribute is already well-supported in DevTools and other tooling; no changes are necessary. Will this feature be supported on all six Blink platforms (Windows, Mac, Linux, Chrome OS, Android, and Android WebView)? Yes Is this feature fully tested by web-platform-tests <https://chromium.googlesource.com/chromium/src/+/main/docs/testing/web_platform_tests.md> ? Pull Request: https://github.com/web-platform-tests/wpt/pull/38547 Flag name N/A Requires code in //chrome? False Tracking bug https://bugs.chromium.org/p/chromium/issues/detail?id=1412729 Sample links https://mathiasbynens.be/demo/pattern-u-vs-v Estimated milestones M114 Link to entry on the Chrome Platform Status https://chromestatus.com/feature/5149507107422208 -- You received this message because you are subscribed to the Google Groups "blink-dev" group. To unsubscribe from this group and stop receiving emails from it, send an email to blink-dev+...@chromium.org. To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/blink-dev/CADizRgaAq4FwzJbUqLQVo%2BQdd_V0PT7rBr510OGe8fenHA%3D3HQ%40mail.gmail.com <https://groups.google.com/a/chromium.org/d/msgid/blink-dev/CADizRgaAq4FwzJbUqLQVo%2BQdd_V0PT7rBr510OGe8fenHA%3D3HQ%40mail.gmail.com?utm_medium=email&utm_source=footer> . -- You received this message because you are subscribed to the Google Groups "blink-dev" group. To unsubscribe from this group and stop receiving emails from it, send an email to blink-dev+...@chromium.org. To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/blink-dev/c9571b2a-a35b-3824-0f37-c93a9bb522fc%40chromium.org <https://groups.google.com/a/chromium.org/d/msgid/blink-dev/c9571b2a-a35b-3824-0f37-c93a9bb522fc%40chromium.org?utm_medium=email&utm_source=footer> . -- You received this message because you are subscribed to the Google Groups "blink-dev" group. To unsubscribe from this group and stop receiving emails from it, send an email to blink-dev+...@chromium.org. To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/blink-dev/CADizRgYxU2v2ANQgzNiLD%2B4P-qJHxzTYJfRDsKNCtY0Yb_0bdg%40mail.gmail.com <https://groups.google.com/a/chromium.org/d/msgid/blink-dev/CADizRgYxU2v2ANQgzNiLD%2B4P-qJHxzTYJfRDsKNCtY0Yb_0bdg%40mail.gmail.com?utm_medium=email&utm_source=footer> . -- You received this message because you are subscribed to the Google Groups "blink-dev" group. To unsubscribe from this group and stop receiving emails from it, send an email to blink-dev+...@chromium.org. To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/blink-dev/bf73fe5b-fde2-42df-90f0-582a905d1948n%40chromium.org <https://groups.google.com/a/chromium.org/d/msgid/blink-dev/bf73fe5b-fde2-42df-90f0-582a905d1948n%40chromium.org?utm_medium=email&utm_source=footer> . -- You received this message because you are subscribed to the Google Groups "blink-dev" group. To unsubscribe from this group and stop receiving emails from it, send an email to blink-dev+unsubscr...@chromium.org. To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/blink-dev/a199cb55-99bb-406a-994e-ac6fe4179c97n%40chromium.org.