cleaner

Vincent Massol Tue, 24 Feb 2009 00:58:08 -0800

Hi Asiri,

On Feb 24, 2009, at 7:26 AM, Asiri Rathnayake wrote:


> Hi Vincent,
>
>
> This is a bug in the XHTML parser. It should generate an embedded
>> document. This is true for any block element inside a table cell.
>>
>> However in order to get simpler xwiki syntax we could modify the  
>> XWiki
>> Syntax Renderer to remove the embedded doc in case it contains only a
>> paragraph.
>>
>
> I will raise a JIRA issue for this.
>
>
>>
>>> Now that you asked about it, I might have been working myself  
>>> around a
>>> possible bug in rendering. But these are what I saw as solutions:
>>>
>>> 1. Wrap the paragraph inside <div class="xwiki-document"> : This
>>> results in
>>> enlarged table header elements.
>>
>> why?
>
>
> I'm talking with respect to the original word document. This is a  
> problem
> with OO server's html generation because it generates a paragraph  
> inside
> each table cell / table header item, the generated html kind of looks
> enlarged when rendered on a browser. Also, since we strip those  
> <style>
> tags, the content gets even more enlarged.

I was asking why having <div class="xwiki-document"> didn't work  
nicely since this is the correct behavior. We should get:

<td><div class="xwiki-document"><p>whatever</p></div></td>

I don't understand why this would not be represented the same as in OO.

> To work around this problem I chose to strip any isolated paragraph  
> elements
> found inside table cells / table header items.
>
>
>> 2. Remove the paragraph if it's an isolated one (only one paragraph
>>> inside
>>> the 'th' element) if there are more than one paragraph or other
>>> elements
>>> (like lists), then wrap the content within the 'th' element inside a
>>> <div
>>> class="xwiki-document">
>>>
>>> I've been using the second approach because it yielded the best
>>> results so
>>> far... Now, have i been working around a bug which should be fixed  
>>> in
>>> rendering? :)
>>
>> I think so. In addition you haven't fixed the problem in the general
>> case. For example if someone chooses HTML 4.01 syntax in wiki pages.
>>
>
>> Even if the problem was not in the parser/renderer you should still
>> have moved it in the default HTML cleaner and not in the office
>> cleaner IMO since I don't see the relationship with office import.
>
>
> I don't think this is correct. If the user chooses HTML 4.01 syntax,  
> he
> knows what is doing and he expects table cells / table header items to
> appear large if he puts a <p> inside a <td> item or <th> item.

This is not about large or not large (l&f is handled by the CSS only)  
and we need to normalize the HMTL in exactly the same manner.

> But the story
> is different for OO generated html which puts a paragraph element  
> when there
> shouldn't be one.

I don't agree since it's very valid to have <p> inside cells and not a  
OO problem.

> That is why i beleived that this particular issue belongs
> to officeimporter module and not html cleaner module.

I still think the HTML parser should generate the following events:  
beginCell, beginDocument, beginPara, onWord, endPara, endDocument,  
endCell.

I also still think that, as an optimization, the Wiki Syntax Renderer  
should removed the embedded doc in case there's a single para in the  
embedded doc.

Thanks
-Vincent
http://xwiki.com
http://xwiki.org
http://massol.net






_______________________________________________
devs mailing list
[email protected]
http://lists.xwiki.org/mailman/listinfo/devs

Re: [xwiki-devs] [xwiki-notifications] r16999 - platform/core/trunk/xwiki-officeimporter/src/test/java/org/xwiki/officeimporter/internal/cleaner

Reply via email to