On Aug 18, 2008, at 4:37 AM, Wang Ning wrote:

> Hi Vincent,
>
> On Mon, Aug 18, 2008 at 2:59 AM, Vincent Massol <[EMAIL PROTECTED]>  
> wrote:
>> Hi Wang,
>>
>> I have some questions/remarks on the commit below:
>>
>> 1) It's hard to see what you're modifying from the default tag infos
>> (in default.xml). I think a better solution might be to extend
>> DefaultTagProvider and use the constructor accepting an info  
>> provider,
>> see http://htmlcleaner.sourceforge.net/doc/org/htmlcleaner  
>> HtmlCleaner.html#HtmlCleaner(org.htmlcleaner.ITagInfoProvider)
>> . In this manner we just extend the default and not duplicate it.
>> WDYT? Is this possible? Are you adding tags or redefining default  
>> one?
>>
> I have seen this DefaultTagProvider, but I don't think it can provide
> what I want.
> I want to replace some tags with others.

public class MyProvider extends DefaultTagProvided
{
   public TagInfo getTagInfo(String tagName)
   {
     if (tagName.equals("s") {
      return new TagInfo(....);
     } else {
       return super.getTagInfo(tagName)
   }
   }
}

or something like that.

I guess it looks better if we have only a few changes to make. If we  
have a lot then the file solution might be better. In any case make  
sure you separate and document your changes from the default in  
default.xml.

> Such like replace <p> in <li>
> to <span>, replace deprecated tags with others. Like replace <s> to
> <del>. But the default.xml can define this.
>
>> 2) The following code is not required:
>>
>>> +    private void removeHead(TagNode node)
>>> +    {
>>> +        logger.debug("remove the head tag of the html");
>>> +        TagNode head = node.findElementByName("head", true);
>>> +        head.removeFromTree();
>>> +    }
>>
>> instead you can just use         props.setOmitHtmlEnvelope(true);
> I have tried props.setOmitHtmlEnvelope(true), but there are some
> errors. If input is
> <html>
> <head/>
> <body>
> <p>p1</p>
> <p>p2</p>
> </body>
> </html>
> the output TagNode is not <p>p1</p><p>p2</p> but only <p>p2</p>. So I
> have to use removeHead() to remove the head tag.

htmlcleaner seems quite buggy... Even with setOmitHtmlEnvelope to  
false it still generate something strange with an extra line between  
the 2 paragraphs of your example. Do you think you could report an  
issue in the issue tracker of the htmlcleaner project so that they can  
fix it in the next release?

>> 3) Could you explain why we need convertAllAttribute2Lowercase?
>>
> In xhtml, all the attributes value should be lowercase.

Do you have a link to the XHTML spec for this? I can only find 
http://www.w3.org/TR/xhtml1/#h-4.2 
  but this is different.

I have only found this 
http://lists.xml.org/archives/xml-dev/200109/msg00718.html

I'd like to know what we gain by transforming in lowercase. Since it's  
extra effort we need to rationalize it.

> And some
> attributes value in the conversion result of openoffice is uppercase,
> like algin="CENTER". I think lowercase will be better. But if
> xhtmlparser could handler uppercase attributes value, I will remove
> this method.
>> 4) Could you refactor your code to use the new HTMLCleaner code I  
>> have
>> put in xwiki-xml module?
> I will do it today.

cool

Thanks
-Vincent

>
>>
>> Feel free to provide patches for CleaningFilters.
> [snip]
>
> -- 
> Thanks
> Wang Ning
> _______________________________________________
> devs mailing list
> [email protected]
> http://lists.xwiki.org/mailman/listinfo/devs

_______________________________________________
devs mailing list
[email protected]
http://lists.xwiki.org/mailman/listinfo/devs

Reply via email to