On 2010-03-23, rgheck wrote: > On 03/23/2010 05:22 AM, Guenter Milde wrote: >> On 2010-03-22, rgheck wrote:
>>> As he said, this is highly non-trivial. And the better the website, the >>> harder it is, since a good website will use semantic markup that is >>> styled by CSS. Then what do you do? >> Of course transform semantic markup to semantic markup. This implies >> that the website uses "really good" markup (text with HTML markup >> indicating its logical structure), not CSS-styled<div> and<span> soups. > The difficulty is that HTML is very limited in what it is capable of > marking, for the simple reason that there aren't very many tags. This is why our task becomes easy, if we decide to support HTML markup only. Such a HTML importer would stand in the middle between simple text import (no structure/markup preserved) and the "annoying" fully-formatted pasting of some WPs. The basic structure of the document (lists, sections, emphasized and preformatted text) is preserved in a WYSIWYG way. > LyX character styles, for example, would almost uniformly correspond to > "span", except for the handful of obvious exceptions. That, it seems to > me, is why "use div and span for everything" has become almost the > norm. See e.g. elyxer's HTML output. LyX's is more flexible, because > it is specifiable in the layout. But the problem remains. Similar to the goal to achieve a loss-less lyx->latex->lyx round-trip, we could (as a further option) extend the HTML importer to provide for the features/constructs of the native HTML export. Günter
