On Feb 24, 2009, at 7:24 PM, Sergiu Dumitriu wrote: > Vincent Massol wrote: >> On Feb 24, 2009, at 4:48 PM, Sergiu Dumitriu wrote: >> >>> Asiri Rathnayake wrote: >>>> Hi Vincent, >>>> >>>>> But the story >>>>>> is different for OO generated html which puts a paragraph element >>>>>> when there >>>>>> shouldn't be one. >>>>> I don't agree since it's very valid to have <p> inside cells and >>>>> not a >>>>> OO problem. >>>> >>>> It's very valid to have <p> elements inside table cells. But my >>>> point is >>>> this: >>>> >>>> The original word document when viewed through _oo writer_ displays >>>> content >>>> within table cells with a particular size. But when saved as html >>>> and viewed >>>> from a browser, the same table cell becomes enlarged. And this is >>>> because >>>> there is a paragraph element inside each table cell element >>>> generated by oo >>>> html generator. >>>> >>>> Now, since we wanted officeimporter to generate wiki content that >>>> would >>>> ultimately render an output which looks close to the original >>>> document, i >>>> decided to strip the paragraph element (to make it look smaller and >>>> close to >>>> the sizing of original document rendered in oo writer) >>>> >>>> But if it's only a matter of convension (wiki is wiki, office is >>>> office) and >>>> the paragraph should be left alone I can make that chage easily. >>>> >>>> WDYT? >>>> >>> I for one prefer removing the paragraph. For me, this is clearly >>> an OO >>> shortcoming. Vincent, the idea is not about paragraphs inside table >>> cells in general, but about this particular paragraph that obviously >>> shouldn't be there. The HTML generated by OO is just an >>> intermediary, >>> we're not interested in keeping it as much as possible in the >>> wiki, we >>> just want to extract the data from it and convert it to wiki syntax. >>> The >>> Office importer transforms office documents to wiki documents, and >>> not >>> HTML to wiki. OO wrongly puts paragraphs in there, and the fact that >>> the >>> same HTML looks much different in a browser than the document >>> looks in >>> OO is a good enough argument, IMO. >> >> This is generic and not specific to OO. HTML allows puttings one or >> several paragraphs in table cells, list item,etc so we need to handle >> those, independently of OO. >> If we handle it at the rendering module level then it fixes both OO >> and direct HTML input. > > No. We should not strip all the paragraphs that are found inside table > cells.
I've never said this! What I told Asiri is that the XHTML parser should generate the following events: beginCell + beginDocument + beginPara + onWord(sometext) + endPara + endDocument + endCell. > Maybe the user wants those there. I don't agree. We're making transformation and we're not leaving the user content untouched. For example if the user enters "**hello" it'll get converted to "**hello**". There are several cases where we're transforming what the user enters. Here I'm proposing that the XWiki Syntax Renderer transforms the events above into: | sometext instead of: | (((sometext))) > But we know for sure that the > _intermediary_ HTML generated by OO contains Ps where it shouldn't. It > is specific. In general we should respect the markup, but in this > specific case it is just a workaround for a third party bug. HTMLs > generated by office suites is messy in general. I for one really hate > the bulky sh1t that MS Word names HTML. I still don't agree. See above. Thanks -Vincent _______________________________________________ devs mailing list [email protected] http://lists.xwiki.org/mailman/listinfo/devs

