On Aug 18, 2008, at 10:02 AM, Vincent Massol wrote: > > On Aug 18, 2008, at 4:37 AM, Wang Ning wrote: > >> Hi Vincent, >> >> On Mon, Aug 18, 2008 at 2:59 AM, Vincent Massol >> <[EMAIL PROTECTED]> wrote: >>> Hi Wang, >>> >>> I have some questions/remarks on the commit below: >>> >>> 1) It's hard to see what you're modifying from the default tag infos >>> (in default.xml). I think a better solution might be to extend >>> DefaultTagProvider and use the constructor accepting an info >>> provider, >>> see http://htmlcleaner.sourceforge.net/doc/org/htmlcleaner >>> HtmlCleaner.html#HtmlCleaner(org.htmlcleaner.ITagInfoProvider) >>> . In this manner we just extend the default and not duplicate it. >>> WDYT? Is this possible? Are you adding tags or redefining default >>> one? >>> >> I have seen this DefaultTagProvider, but I don't think it can provide >> what I want. >> I want to replace some tags with others. > > public class MyProvider extends DefaultTagProvided > { > public TagInfo getTagInfo(String tagName) > { > if (tagName.equals("s") { > return new TagInfo(....); > } else { > return super.getTagInfo(tagName) > } > } > } > > or something like that. > > I guess it looks better if we have only a few changes to make. If we > have a lot then the file solution might be better. In any case make > sure you separate and document your changes from the default in > default.xml. > >> Such like replace <p> in <li> >> to <span>, replace deprecated tags with others. Like replace <s> to >> <del>. But the default.xml can define this. >> >>> 2) The following code is not required: >>> >>>> + private void removeHead(TagNode node) >>>> + { >>>> + logger.debug("remove the head tag of the html"); >>>> + TagNode head = node.findElementByName("head", true); >>>> + head.removeFromTree(); >>>> + } >>> >>> instead you can just use props.setOmitHtmlEnvelope(true); >> I have tried props.setOmitHtmlEnvelope(true), but there are some >> errors. If input is >> <html> >> <head/> >> <body> >> <p>p1</p> >> <p>p2</p> >> </body> >> </html> >> the output TagNode is not <p>p1</p><p>p2</p> but only <p>p2</p>. So I >> have to use removeHead() to remove the head tag. > > htmlcleaner seems quite buggy... Even with setOmitHtmlEnvelope to > false it still generate something strange with an extra line between > the 2 paragraphs of your example. Do you think you could report an > issue in the issue tracker of the htmlcleaner project so that they > can fix it in the next release?
In the meantime would be good if you could submit a patch for our HTMLCleaner to add this new filter with good javadoc explaining why we need it, a reference to the issue URL you'd have created. I'll commit it in the xwiki-xml module. Thanks -Vincent >>> 3) Could you explain why we need convertAllAttribute2Lowercase? >>> >> In xhtml, all the attributes value should be lowercase. > > Do you have a link to the XHTML spec for this? I can only find > http://www.w3.org/TR/xhtml1/#h-4.2 > but this is different. > > I have only found this > http://lists.xml.org/archives/xml-dev/200109/msg00718.html > > I'd like to know what we gain by transforming in lowercase. Since > it's extra effort we need to rationalize it. > >> And some >> attributes value in the conversion result of openoffice is uppercase, >> like algin="CENTER". I think lowercase will be better. But if >> xhtmlparser could handler uppercase attributes value, I will remove >> this method. >>> 4) Could you refactor your code to use the new HTMLCleaner code I >>> have >>> put in xwiki-xml module? >> I will do it today. > > cool > > Thanks > -Vincent > >> >>> >>> Feel free to provide patches for CleaningFilters. >> [snip] >> >> -- >> Thanks >> Wang Ning >> _______________________________________________ >> devs mailing list >> [email protected] >> http://lists.xwiki.org/mailman/listinfo/devs > _______________________________________________ devs mailing list [email protected] http://lists.xwiki.org/mailman/listinfo/devs

