Dom Lachowicz wrote:
> 
> >Would it be hard to allow non well-formed HTML to be imported ?
> >If not, how hard would it be to provide return on the syntax error
> >so that user can fix it and try again ?
> 
> Yes, this would be very hard. Currently, we base our HTML importer on our
> XML importer class, which means that the HTML must be well-formed. Writing a
> parser for HTML isn't high on my priority list, though, esp. with all of the
> nastiness that the browsers allow you to do. XHTML deprecates a lot of the
> tags (yay!) and is *very* strict about what can appear where
> (well-formedness).
> 
> It shouldn't be too hard to propegate a more-descriptive error message
> upstream, however.
> 

Any chance of a partial/lossy import, ignore all unknown tags, dump all
unmatched tags ...???

More simply, what im trying to suggest is, "Error: this document
contains
invalid HTML.  Would you like to import it as plain text with line
breaks"

Please :)

or as a temporary measure we could recommend a HTML validator like:
http://validator.w3.org/

-- 
   ~     
  |v|    
 // \\   
/(   )\  
 ^`~'^

Reply via email to