Hi Asiri,
On Feb 27, 2009, at 12:32 PM, asiri (SVN) wrote:
> Author: asiri
> Date: 2009-02-27 12:32:21 +0100 (Fri, 27 Feb 2009)
> New Revision: 17078
>
> Added:
> platform/core/trunk/xwiki-officeimporter/src/test/java/org/xwiki/
> officeimporter/internal/cleaner/AbstractHTMLCleaningTest.java
> platform/core/trunk/xwiki-officeimporter/src/test/java/org/xwiki/
> officeimporter/internal/cleaner/
> EmptyLineParagraphOpenOfficeCleaningTest.java
> platform/core/trunk/xwiki-officeimporter/src/test/java/org/xwiki/
> officeimporter/internal/cleaner/ImageOpenOfficeCleaningTest.java
> platform/core/trunk/xwiki-officeimporter/src/test/java/org/xwiki/
> officeimporter/internal/cleaner/InvalidTagOpenOfficeCleaningTest.java
> platform/core/trunk/xwiki-officeimporter/src/test/java/org/xwiki/
> officeimporter/internal/cleaner/LineBreakOpenOfficeCleaningTest.java
> platform/core/trunk/xwiki-officeimporter/src/test/java/org/xwiki/
> officeimporter/internal/cleaner/LinkOpenOfficeCleaningTest.java
> platform/core/trunk/xwiki-officeimporter/src/test/java/org/xwiki/
> officeimporter/internal/cleaner/ListOpenOfficeCleaningTest.java
> platform/core/trunk/xwiki-officeimporter/src/test/java/org/xwiki/
> officeimporter/internal/cleaner/MiscWysiwygCleaningTest.java
> platform/core/trunk/xwiki-officeimporter/src/test/java/org/xwiki/
> officeimporter/internal/cleaner/
> RedundantTagOpenOfficeCleaningTest.java
> platform/core/trunk/xwiki-officeimporter/src/test/java/org/xwiki/
> officeimporter/internal/cleaner/TableOpenOfficeCleaningTest.java
> Removed:
> platform/core/trunk/xwiki-officeimporter/src/test/java/org/xwiki/
> officeimporter/internal/cleaner/AbstractHTMLCleanerTest.java
> platform/core/trunk/xwiki-officeimporter/src/test/java/org/xwiki/
> officeimporter/internal/cleaner/OpenOfficeHTMLCleanerTest.java
> platform/core/trunk/xwiki-officeimporter/src/test/java/org/xwiki/
> officeimporter/internal/cleaner/WysiwygHTMLCleanerTest.java
> Modified:
> platform/core/trunk/xwiki-officeimporter/src/main/java/org/xwiki/
> officeimporter/filter/LineBreakFilter.java
> Log:
> XWIKI-3265: Restructure officeimporter test cases + write more tests
>
> * Completed.
[snip]
> +public class InvalidTagOpenOfficeCleaningTest extends
> AbstractHTMLCleaningTest
> +{
> + /**
> + * {...@code <style>} tags should be stripped from html content.
> + */
> + public void testStyleTagRemoving()
> + {
> + String html =
> + "<html><head><title>Title</title>" + "<style type=
> \"text/css\">h1 {color:red} p {color:blue} </style>"
> + + "</head><body>" + footer;
> + Document doc = openOfficeHTMLCleaner.clean(new
> StringReader(html));
> + NodeList nodes = doc.getElementsByTagName("style");
> + assertEquals(0, nodes.getLength());
> + }
> +
> + /**
> + * {...@code <style>} tags should be stripped from html content.
copy paste, should be <script>.
> + */
> + public void testScriptTagRemoving()
> + {
> + String html = header + "<script type=\"text/javascript
> \">document.write(\"Hello World!\")</script>" + footer;
> + Document doc = openOfficeHTMLCleaner.clean(new
> StringReader(html));
> + NodeList nodes = doc.getElementsByTagName("script");
> + assertEquals(0, nodes.getLength());
> + }
> +}
>
[snip]
> + /**
> + * {...@code <br/>} elements placed next to paragraph elements
> should be converted to {...@code<div
> + * class="wikikmodel-emptyline"/>} elements.
> + */
> + public void testLineBreaksNextToParagraphElements()
> + {
> + checkLineBreakReplacements("<br/><br/><p>para</p>", 0, 2);
> + checkLineBreakReplacements("<p>para</p><br/><br/>", 0, 2);
> + checkLineBreakReplacements("<p>para</p><br/><br/><p>para</
> p>", 0, 2);
> + }
Shouldn't this be done by the default HTML Cleaner?
Same for the other tests in this category.
> + /**
> + * The html generated by open office server includes anchors of
> the form {...@code<a name="table1"><h1>Sheet 2:
> + * <em>Hello</em></h1></a>} and the default html cleaner
> converts them to {...@code <a name="table1"/><h1><a
> + * name="table1">Sheet 1: <em>Hello</em></a></h1>} this is
> because of the close-before-copy-inside
> + * behaviour of default html cleaner. Thus the additional (copy-
> inside) anchor needs to be ripped off.
This looks like a bug in the default HTML cleaner no?
> + /**
> + * If there are leading spaces within the content of a list
> item ({...@code<li/>}) they should be trimmed.
> + */
> + public void testListItemContentLeadingSpaceTrimming()
> + {
> + String html = header + "<ol><li> Test</li></ol>" + footer;
> + Document doc = openOfficeHTMLCleaner.clean(new
> StringReader(html));
> + NodeList nodes = doc.getElementsByTagName("li");
> + Node listContent = nodes.item(0).getFirstChild();
> + assertEquals(Node.TEXT_NODE, listContent.getNodeType());
> + assertEquals("Test", listContent.getNodeValue());
> + }
Shouldn't this be done in the default HTML cleaner? Actually I think
this is already done in the XHTML parser by the whitespace XML filter.
If not then it's a bug of the whitespace filter.
For all bugs please refer to the jira issue in the javadoc and explain
that the code will be removed once the bug is fixed.
> +
> + /**
> + * If there is a leading paragraph inside a list item, it
> should be replaced with it's content.
> + */
> + public void testListItemContentIsolatedParagraphCleaning()
> + {
> + String html = header + "<ol><li><p>Test</p></li></ol>" +
> footer;
> + Document doc = openOfficeHTMLCleaner.clean(new
> StringReader(html));
> + NodeList nodes = doc.getElementsByTagName("li");
> + Node listContent = nodes.item(0).getFirstChild();
> + assertEquals(Node.TEXT_NODE, listContent.getNodeType());
> + assertEquals("Test", listContent.getNodeValue());
> + }
> +}
This should be handled by a combination of both XHTML parser and Wiki
Syntax Renderer and/or by the default HTML cleaner.
> + /**
> + * Test cleaning of html paragraphs brearing namespaces.
> + */
> + public void testParagraphsWithNamespaces()
> + {
> + String html = header + "<w:p>paragraph</w:p>" + footer;
> + Document doc =
> + wysiwygHTMLCleaner.clean(new StringReader(html),
> Collections.singletonMap(HTMLCleaner.NAMESPACES_AWARE,
> + "false"));
> + NodeList nodes = doc.getElementsByTagName("p");
> + assertEquals(1, nodes.getLength());
> + }
hmmm... I think this needs to be reviewed and we need to check if the
wikimodel XHTML parser supports namespaces.
> +
> + /**
> + * The source of the images in copy pasted html content should
> be replaces with 'Missing.png' since they can't be
> + * uploaded automatically.
> + */
> + public void testImageFiltering()
> + {
> + String html = header + "<img src=\"file://path/to/local/image.png
> \"/>" + footer;
> + Document doc = wysiwygHTMLCleaner.clean(new
> StringReader(html));
> + NodeList nodes = doc.getElementsByTagName("img");
> + assertEquals(1, nodes.getLength());
> + Element image = (Element) nodes.item(0);
> + Node startComment = image.getPreviousSibling();
> + Node stopComment = image.getNextSibling();
> + assertEquals(Node.COMMENT_NODE, startComment.getNodeType());
> +
> assertTrue
> (startComment.getNodeValue().equals("startimage:Missing.png"));
It should be lowercase "missing.png". So this means a missing.png
image need to be present in all skins?
Has this been discussed and is everyone aware of this?
> + /**
> + * Test filtering of those tags which doesn't have any
> attributes set.
> + */
> + public void testFilterIfZeroAttributes()
> + {
> + String htmlTemplate = header + "<p>Test%sRedundant
> %sFiltering</p>" + footer;
> + String[] filterIfZeroAttributesTags = new String[] {"span",
> "div"};
> + for (String tag : filterIfZeroAttributesTags) {
> + String startTag = "<" + tag + ">";
> + String endTag = "</" + tag + ">";
> + String html = String.format(htmlTemplate, startTag,
> endTag);
> + Document doc = openOfficeHTMLCleaner.clean(new
> StringReader(html));
> + NodeList nodes = doc.getElementsByTagName(tag);
> + assertEquals(0, nodes.getLength());
> + }
> + }
Shouldn't this be done in the default HTML cleaner?
> +
> + /**
> + * Test filtering of those tags which doesn't have any textual
> content in them.
> + */
> + public void testFilterIfNoContent()
> + {
> + String htmlTemplate = header + "<p>Test%sRedundant%s%s
> %sFiltering</p>" + footer;
> + String[] filterIfNoContentTags =
> + new String[] {"em", "strong", "dfn", "code", "samp",
> "kbd", "var", "cite", "abbr", "acronym", "address",
> + "blockquote", "q", "pre", "h1", "h2", "h3", "h4", "h5",
> "h6"};
> + for (String tag : filterIfNoContentTags) {
> + String startTag = "<" + tag + ">";
> + String endTag = "</" + tag + ">";
> + String html = String.format(htmlTemplate, startTag,
> endTag, startTag, endTag);
> + Document doc = openOfficeHTMLCleaner.clean(new
> StringReader(html));
> + NodeList nodes = doc.getElementsByTagName(tag);
> + assertEquals(1, nodes.getLength());
> + }
> + }
> +}
Shouldn't this be done in the default HTML cleaner?
> + /**
> + * An isolated paragraph inside a table cell item should be
> replaced with paragraph's content.
> + */
> + public void testTableCellItemIsolatedParagraphCleaning()
> + {
> + String html = header + "<table><tr><td><p>Test</p></td></
> tr></table>" + footer;
> + Document doc = openOfficeHTMLCleaner.clean(new
> StringReader(html));
> + NodeList nodes = doc.getElementsByTagName("td");
> + Node cellContent = nodes.item(0).getFirstChild();
> + assertEquals(Node.TEXT_NODE, cellContent.getNodeType());
> + assertEquals("Test", cellContent.getNodeValue());
> + }
Isn't this already tested above?
In any case shouldn't this be moved out of the importer?
Same for other tests in the same category.
> + /**
> + * If multiple paragraphs are found inside a table cell item,
> they should be wrapped in an embedded document.
> + */
> + public void testTableCellItemMultipleParagraphWrapping()
> + {
> + assertEquals(true,
> checkEmbeddedDocumentGeneration("<table><tr><td><p>Test</p><p>Test</
> p></td></tr></table>",
> + "td"));
> + }
This looks like a bug in the XHTML parser.
Same for other tests in the same category.
> +
> + /**
> + * Empty rows should be removed.
> + */
> + public void testEmptyRowRemoving()
> + {
> + String html = header + "<table><tbody><tr><td>cell</td></
> tr><tr></tr></tbody></table>" + footer;
> + Document doc = openOfficeHTMLCleaner.clean(new
> StringReader(html));
> + NodeList nodes = doc.getElementsByTagName("tr");
> + assertEquals(1, nodes.getLength());
> + }
Shouldn't this be done in the default HTML cleaner?
Thanks
-Vincent
http://xwiki.com
http://xwiki.org
http://massol.net
_______________________________________________
devs mailing list
[email protected]
http://lists.xwiki.org/mailman/listinfo/devs