> On 18 Jun 2015, at 4:44 am, Franz de Copenhague 
> <franzdecopenha...@outlook.com> wrote:
> 
> I think that I did comment previously, using data-* attribute for the 
> persistency of DFNode sequence number, instead of the HMTL id. This is 
> limitation to the client app that cannot manipulate the HTML id for its own 
> purpose.
> 
> http://www.w3.org/TR/2011/WD-html5-20110525/elements.html#embedding-custom-non-visible-data-with-the-data-attributes
> 

I think in principle either way would wok fine. The id attribute is supposed to 
be unique across all elements in a document, and I would expect most programs 
that manipulate the HTML to keep the id attributes as-is (that’s just an 
educated guess, it’s not guaranteed).

We also use the ids for cross-references (e.g. if you have a labeled figure, 
and a hyperlink saying “See Figure 1”), so at minimum we need to keep them for 
those purposes (though that could be considered a separate requirement than 
that of identifiers for bi-directional transformation).

Ultimately what I’d like to achieve though is to avoid the need for the id 
attributes for BDT purposes entirely, because there will probably be some use 
cases where it’s not possible to maintain them. For example, someone converts a 
Word document to Markdown, because that’s what they prefer to use. After 
modifying the file, they update the Word document and send it back to their 
“unenlightened” colleague. Markdown doesn’t support id attributes so we can’t 
rely on those to work out which parts of the Markdown file (reconstituted 
internally as a HTML file prior to the update actually taking place).

The strategy I think we could use here is to essentially do a diff - but it has 
to be more intelligent than a simple line-based diff, because of the tree 
structure. I experimented with this a while back using the Myers diff algorithm 
(which assumes a sequence of items, not a tree). However my attempts to modify 
it to deal with trees were unsuccessful. There’s been some other research done 
on tree diff algorithms that I haven’t had a chance to look into yet, but I’m 
hopeful we may be able to find or develop a suitable algorithm, at least for 
the case of languages like Markdown as in the use case above.

[1] http://www.xmailserver.org/diff2.pdf

—
Dr Peter M. Kelly
pmke...@apache.org

PGP key: http://www.kellypmk.net/pgp-key <http://www.kellypmk.net/pgp-key>
(fingerprint 5435 6718 59F0 DD1F BFA0 5E46 2523 BAA1 44AE 2966)

Reply via email to