Contact [email protected]
ExplainerMake Rust parsing memory safe in Chrome, replace unsafe C library
usage of libxml2 with Rust based XML parsing based on the Rust XML crate
<https://crates.io/crates/xml>. Eliminate class of XML parsing memory
corruption security issues.
Specification
Several web specs affected, wherever we parse XML: DOMParser JS API,
XMLHttpRequest, SVG, XHTML.
Design docsInternal design doc
<https://docs.google.com/document/d/1ubSlZfl7kLUSfTUKEVRNM-bCsQXjVIiyTgxlyOleTDM/edit?tab=t.0#heading=h.xejx24r20h6z>
Summary

Moving to a Rust based XML parser is part of our strategy to address
security issues and the difficult security track record of libxml2 and
libxslt.

Unfortunately, for XSLT support, libxml2 and libxslt are tightly
entangled. However, we can already ship the Rust based XML parser for
scenarios without XSLT.

In this intent to experiment, I propose to roll out the Rust-based XML
parser for scenarios where no XSLT processing is required.:


   1. DOMParser Web API
   2. Accessing responseXML of XMLHttpRequest
   3. Likely: SVG Standalone Images (i.e. accessing a image.svg document
   directly as a top level navigation)
   4. Likely: SVG external images (A main document embedding an SVG as an
   external image resource).

For details of "Likely", see Risks below.
Blink componentBlink>DOM, Blink>SVG

TAG reviewNone, no functional change expected.
Risks
Interoperability and Compatibility

*XML Parsing and Serialization*

In implementing the Rust based XML parser we tested against ~400 WPT and
internal web tests and brought down the failures to close to 0. For
DOMParser and XMLHttpRequest test, the new parser is already permantenly
running on bots.

Technically, niche issues remain where in serialization, with the new Rust
parser we may occasionally insert an extra xmlns: element on a root element
due to API restrictions of the XML parser. This does not affect document
semantics.

>From WPT tests we do not see other compatibility issues, and we believe we
can progress to real-world testing for the non XSLT scenarios.

*Inline XSLT in SVG*

There is a minor theoretical risk regarding standalone SVG images (scenario
3 above) that utilize inline XSLT to transform raw XML data into SVG.
Example <http://rtsh.es/test/xml/svg_xslt_img.html>

Data: UseCounters (XSLPIInSVGImage and XSLPIInSVGImageStandalone) currently
show 0% usage in Canary and Dev.

Current State: This works in Chrome for standalone docs but not for
externally referenced images (matching Firefox). Safari supports both.

Signal: Mozilla and Apple both have expressed support for deprecating XSLT
in general, not only in this context (WHATWG Issue #11523).

*Gecko*: No signal

*WebKit*: No signal, will file deprecation proposal to disallow XSLT for
SVG generation

*Web developers*: No Signal

*Other signals*: Mozilla and Apple support deprecating XSLT from
discussions in WHATWG. https://github.com/whatwg/html/issues/11523

Security

Changing to Rust based XML parsing eliminates a class (and historic chain)
of memory corruption bugs in XML parsing. Libxml2 had unstable
maintainership, and delays in responding to security issues.

WebView application risks

Does this intent deprecate or change behavior of existing APIs, such that
it has potentially high risk for Android WebView-based applications?

No, expect for the likely non-existent usage of XSLT in SVG.


Goals for experimentation

I propose rolling out to 50% Dev, Canary and Beta. Then progress to 1% on
stable after observing the new use counter for XSLT usage in SVG, and
monitoring the perf histogram Blink.
XMLParsing.NonXsltXmlParsingTime.Combined
<https://source.chromium.org/chromium/chromium/src/+/main:third_party/blink/renderer/core/dom/document.cc;l=7986?q=document.cc>
.

The new Rust parser at this point is not on par with the performance of the
libxml2 based parser and shows a 50% regression in the blink_perf.parser
microbenchmark parsing a 3MB heavy XML document and measuring throughput,
tracked in https://crbug.com/470367156

Rolling out at a small percentage to stable helps us gather the required
metrics to decide whether this leads to real-world performance implications
in the metrics we care about: mainly LCP, and monitoring for major shifts
in the UMA histogram for XML parser timing.
Ongoing technical constraints

Performance. The microbenchmark finding shows that the parser is not at the
same performance of the C parser at this point. Finding out whether this
practically matters is one goal of this experiment.
Debuggability

No issues, both libxml2 and the Rust parser parse into internal DOM
structures which are accessible in source view like before.
Will this feature be supported on all six Blink platforms (Windows, Mac,
Linux, ChromeOS, Android, and Android WebView)?Yes

Is this feature fully tested by web-platform-tests
<https://chromium.googlesource.com/chromium/src/+/main/docs/testing/web_platform_tests.md>
?Yes.

Flag name on about://flagsNot in about://flags.

Finch feature nameXMLRustForNonXslt

Requires code in //chrome?False

Tracking bughttps://crbug.com/466303347

Estimated milestones
Experimental roll out to 1% for M147.

Link to entry on the Chrome Platform Status

Rust XML Parser rollout <https://chromestatus.com/feature/5309598397497344>)

Deprecate XSLT in SVG
<https://chromestatus.com/feature/5143784390262784>

-- 
You received this message because you are subscribed to the Google Groups 
"blink-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion visit 
https://groups.google.com/a/chromium.org/d/msgid/blink-dev/CAN6muBu762SaOZv_a%2BSDpJDnrRVS6Y2ZRETyJfPjdfuEAEG6qA%40mail.gmail.com.

Reply via email to