On 2010-03-23, rgheck wrote:
> On 03/23/2010 05:22 AM, Guenter Milde wrote:
>> On 2010-03-22, rgheck wrote:

>>> As he said, this is highly non-trivial. And the better the website, the
>>> harder it is, since a good website will use semantic markup that is
>>> styled by CSS. Then what do you do?

>> Of course transform semantic markup to semantic markup.  This implies
>> that the website uses "really good" markup (text with HTML markup
>> indicating its logical structure), not CSS-styled<div>  and<span>  soups.


> The difficulty is that HTML is very limited in what it is capable of 
> marking, for the simple reason that there aren't very many tags. 

This is why our task becomes easy, if we decide to support HTML markup
only. 

Such a HTML importer would stand in the middle between simple text import
(no structure/markup preserved) and the "annoying" fully-formatted
pasting of some WPs. The basic structure of the document (lists,
sections, emphasized and preformatted text) is preserved in a WYSIWYG
way.

> LyX character styles, for example, would almost uniformly correspond to
> "span", except for the handful of obvious exceptions. That, it seems to
> me, is why "use div and span for everything" has become almost the
> norm.  See e.g. elyxer's HTML output. LyX's is more flexible, because
> it is specifiable in the layout. But the problem remains.

Similar to the goal to achieve a loss-less lyx->latex->lyx round-trip,
we could (as a further option) extend the HTML importer to provide for
the features/constructs of the native HTML export.

Günter

Reply via email to