How much context does CF_HTML really need?

Henri Sivonen Tue, 31 Oct 2017 02:47:12 -0700

(Context: I'm trying to understand the requirements for our
serializers in case we rewrite them [in Rust].)


The HTML fragment parsing algorithm can have only one context node.
The context is never a chain of nodes towards to the root, since such
a thing wouldn't affect the result per the HTML parsing algorithm.

However, when the HTML parsing algorithm is in the non-fragment mode,
some tags get ignored without appropriate parent, so e.g. to represent
<td> in the non-fragment mode, you need to include <table>, etc. But
that's about it.

The Windows CF_HTML clipboard format,
https://msdn.microsoft.com/en-us/library/windows/desktop/ms649015(v=vs.85).aspx
, represents fragments by designating them in a full HTML document, so
what are logically fragments have to work with non-fragment parsing.

This indicates that when we export a fragment to the clipboard, we
should serialize its parent if not table-related or reconstruct a full
table if table-related.

Yet, it seems that we serialize much more ancestor context.

Is there a good reason to? For example, does Microsoft office (our old
bugs suggest that Excel is the pickiest consumer) or other CF_HTML
consumers on Windows care about more context than the standard HTML
parsing algorithm? What could consumers possibly do with knowlegde
about ancestors beyond parent or the nearest <table>? (I'm ignoring
SVG and MathML for the moment.)

OTOH, it seems that we include only some element types in the context
(https://searchfox.org/mozilla-central/source/dom/base/nsDocumentEncoder.cpp#1540).
It's unclear to me why. The first revision of the list came from jst
during the Netscape 6 crunch without an explanation either in Bugzilla
or code comments. (https://bugzilla.mozilla.org/show_bug.cgi?id=50742)

Does anyone know why?

-- 
Henri Sivonen
hsivo...@hsivonen.fi
https://hsivonen.fi/
_______________________________________________
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform

How much context does CF_HTML really need?

Reply via email to