Vincent Massol wrote:
> On Feb 24, 2009, at 4:48 PM, Sergiu Dumitriu wrote:
> 
>> Asiri Rathnayake wrote:
>>> Hi Vincent,
>>>
>>>> But the story
>>>>> is different for OO generated html which puts a paragraph element
>>>>> when there
>>>>> shouldn't be one.
>>>> I don't agree since it's very valid to have <p> inside cells and  
>>>> not a
>>>> OO problem.
>>>
>>> It's very valid to have <p> elements inside table cells. But my  
>>> point is
>>> this:
>>>
>>> The original word document when viewed through _oo writer_ displays  
>>> content
>>> within table cells with a particular size. But when saved as html  
>>> and viewed
>>> from a browser, the same table cell becomes enlarged. And this is  
>>> because
>>> there is a paragraph element inside each table cell element  
>>> generated by oo
>>> html generator.
>>>
>>> Now, since we wanted officeimporter to generate wiki content that  
>>> would
>>> ultimately render an output which looks close to the original  
>>> document, i
>>> decided to strip the paragraph element (to make it look smaller and  
>>> close to
>>> the sizing of original document rendered in oo writer)
>>>
>>> But if it's only a matter of convension (wiki is wiki, office is  
>>> office) and
>>> the paragraph should be left alone I can make that chage easily.
>>>
>>> WDYT?
>>>
>> I for one prefer removing the paragraph. For me, this is clearly an OO
>> shortcoming. Vincent, the idea is not about paragraphs inside table
>> cells in general, but about this particular paragraph that obviously
>> shouldn't be there. The HTML generated by OO is just an intermediary,
>> we're not interested in keeping it as much as possible in the wiki, we
>> just want to extract the data from it and convert it to wiki syntax.  
>> The
>> Office importer transforms office documents to wiki documents, and not
>> HTML to wiki. OO wrongly puts paragraphs in there, and the fact that  
>> the
>> same HTML looks much different in a browser than the document looks in
>> OO is a good enough argument, IMO.
> 
> This is generic and not specific to OO. HTML allows puttings one or  
> several paragraphs in table cells, list item,etc so we need to handle  
> those, independently of OO.
> If we handle it at the rendering module level then it fixes both OO  
> and direct HTML input.

No. We should not strip all the paragraphs that are found inside table
cells. Maybe the user wants those there. But we know for sure that the
_intermediary_ HTML generated by OO contains Ps where it shouldn't. It
is specific. In general we should respect the markup, but in this
specific case it is just a workaround for a third party bug. HTMLs
generated by office suites is messy in general. I for one really hate
the bulky sh1t that MS Word names HTML.

-- 
Sergiu Dumitriu
http://purl.org/net/sergiu/
_______________________________________________
devs mailing list
[email protected]
http://lists.xwiki.org/mailman/listinfo/devs

Reply via email to