[Libreoffice-bugs] [Bug 40218] FILEOPEN: Calc confused by unclosed HTML tags

bugzilla-daemon Thu, 10 Oct 2013 09:43:53 -0700

https://bugs.freedesktop.org/show_bug.cgi?id=40218


--- Comment #5 from Thomas Arnhold <[email protected]> ---
The HTML importer is only confused by this unclosed anchor tag. I've tried
other tags like <div>, <span> or <font>, but the import works fine. Also <a
name="foo"> works. The only problem exists with <a href="eu">.

A solution would be to manually end the started anchor if the next </td> is
found, but that's some kind of spaghetti:

--- a/editeng/source/editeng/eehtml.cxx
+++ b/editeng/source/editeng/eehtml.cxx
@@ -319,6 +319,7 @@ void EditHTMLParser::NextToken( int nToken )
     case HTML_TABLEHEADER_OFF:
     case HTML_TABLEDATA_OFF:
     {
+        AnchorEnd();
         if ( nInCell )
             nInCell--;
     }


A far better solution for all non-well-formatted HTML documents would be to
clean them up in a first step. This could be done like
http://www.mostthingsweb.com/2013/02/parsing-html-with-c/

Do we want to include tidy in our project? In my opinion this could be a huge
benefit.

-- 
You are receiving this mail because:
You are the assignee for the bug.

_______________________________________________
Libreoffice-bugs mailing list
[email protected]
http://lists.freedesktop.org/mailman/listinfo/libreoffice-bugs

[Libreoffice-bugs] [Bug 40218] FILEOPEN: Calc confused by unclosed HTML tags

Reply via email to