> On 18 Jun 2015, at 4:44 am, Franz de Copenhague > <franzdecopenha...@outlook.com> wrote: > > I think that I did comment previously, using data-* attribute for the > persistency of DFNode sequence number, instead of the HMTL id. This is > limitation to the client app that cannot manipulate the HTML id for its own > purpose. > > http://www.w3.org/TR/2011/WD-html5-20110525/elements.html#embedding-custom-non-visible-data-with-the-data-attributes >
I think in principle either way would wok fine. The id attribute is supposed to be unique across all elements in a document, and I would expect most programs that manipulate the HTML to keep the id attributes as-is (that’s just an educated guess, it’s not guaranteed). We also use the ids for cross-references (e.g. if you have a labeled figure, and a hyperlink saying “See Figure 1”), so at minimum we need to keep them for those purposes (though that could be considered a separate requirement than that of identifiers for bi-directional transformation). Ultimately what I’d like to achieve though is to avoid the need for the id attributes for BDT purposes entirely, because there will probably be some use cases where it’s not possible to maintain them. For example, someone converts a Word document to Markdown, because that’s what they prefer to use. After modifying the file, they update the Word document and send it back to their “unenlightened” colleague. Markdown doesn’t support id attributes so we can’t rely on those to work out which parts of the Markdown file (reconstituted internally as a HTML file prior to the update actually taking place). The strategy I think we could use here is to essentially do a diff - but it has to be more intelligent than a simple line-based diff, because of the tree structure. I experimented with this a while back using the Myers diff algorithm (which assumes a sequence of items, not a tree). However my attempts to modify it to deal with trees were unsuccessful. There’s been some other research done on tree diff algorithms that I haven’t had a chance to look into yet, but I’m hopeful we may be able to find or develop a suitable algorithm, at least for the case of languages like Markdown as in the use case above. [1] http://www.xmailserver.org/diff2.pdf — Dr Peter M. Kelly pmke...@apache.org PGP key: http://www.kellypmk.net/pgp-key <http://www.kellypmk.net/pgp-key> (fingerprint 5435 6718 59F0 DD1F BFA0 5E46 2523 BAA1 44AE 2966)