At 10:09 PM 5/31/00 -0500, sam th wrote:
>The way it looks to me from the Abi side of the word importer is that wv
>provides us with all the properties for a span at once.  However, in HTML,
>you have to deal with lots of messy inheritance.  I don't think we should
>assume that every file format we deal with will be as nice to our system
>as wv is currently.  But then again, maybe HTML is an exception. 

Bingo.  Sounds like we've found the core issue.  

AFAICT, for most word processing formats (not just Word) you can easily 
determine all the properties of a span at once.  Since they share this 
characteristic with AbiWord's internal format, doing the required mappings 
is tedious, but not usually that hard. 

By contrast, classic HTML definitely has "lots of messy inheritance" -- 
which is what makes that importer somewhat harder to write.  You have to 
keep track all of the goofy nesting situations.  

The necessary state machines to flatten nested markup really aren't that 
bad, though.  For well-formed XHTML, essentially the transformation you're 
doing is just a tree-walking exercise:

  <B>one      -->   font-weight:bold
  <I>two</I>  -->   font-weight:bold; font-style:italic
  three</B>   -->   font-weight:bold

However, you'll need a different state machine for classic HTML so that you 
can properly interpret format-toggling "messes" like this:

  <B>one      -->   font-weight:bold
  <I>two</B>  -->   font-weight:bold; font-style:italic
  three</I>   -->   font-style:italic

Sounds to me like those importers are *exactly* where such code belongs, no?  

Paul



Reply via email to