On Nov 19, 2008, at 11:20 AM, Guillaume Lerouge wrote: > Hi Asiri, > > On Wed, Nov 19, 2008 at 10:56 AM, Asiri Rathnayake < > [EMAIL PROTECTED]> wrote: > >> Hi Devs, >> >> I'm working on implementing the style filtering functionality of >> xwiki-office-importer application. But first, i need to make sure >> that I'm >> clear about the policy and the correct approach towards filtering >> style >> information from imported office documents. I would really >> appreciate your >> input on this because I'm not an expert on either html or css :) >> >> Ok, I plan to do two types of filtering. One is filtering various >> attributes >> of various elements (like removing bgcolor attribute from the <body> >> element). And the second one is filtering css related stuff. Let's >> take one >> by one. >> >> 1. Filtering attributes. >> >> This is quite straight-forward but i see two possible approaches. >> >> * The first approach is to keep a list of attributes that we allow >> when >> importing documents. We'll scan each and every tag and strip off any >> unwanted attributes present. >> >> * The second approach is to associate each tag with what attributes >> we >> allow >> for that tag. A list of legal attributes for common tags is >> presented here >> http://www.devx.com/projectcool/Article/19816. Similarly, we'll >> have our >> tag_name->allowed_attributes mapping and filter all other attributes >> present. >> >> I'm currently leaning towards the second option, WDYT ? >> >> 2. Filtering css styles. >> >> Ok, there are three ways one can associate css with html content. >> Let's >> take >> one by one. >> >> (i) External Style Sheet >> >> Well, AFAIK OpenOffice server does not produce this type of output >> when >> converting office documents into html. I mean it doesn't output >> html files >> that refer external css files. So I guess this is something we >> don't need >> to >> worry about. >> >> (ii) Internal Style Sheet >> >> This is something we need to worry about. OpenOffice server >> produces html >> pages with content like <head><style type="text/css">....</style></ >> head>. >> >> Currently we strip off <style> tags completely regardless of the >> filtering >> mode (i.e whether styles are set to be filtered or not <style> tags >> get >> removed). Does this behaviour need to change ? >> >> (iii) In-line Styles >> >> This is the most common type of styling found (Example : <p >> style="....">). >> Present behaviour is to strip off this style attribute completely (if >> filterStyles is set to true). The question is, there are some >> inline styles >> that directly maps to xwiki 2.0 syntax like <p style="font- >> weight:bold">, >> what are we going to do about these ? > > > I can't help you much from the technical perspective. Re styles that > can be > directly mapped to XWiki 2.0 syntax, I think they should be > converted to use > that syntax. To summarize my opinion: > > - When strict filtering is activated (conversion to XWiki 2.0 > syntax) > - Only style attributes that can be directly mapped to wiki syntax > element should be kept > - This means that NO (% ... %) should appear > > Is that fine with everyone?
Yes fine with me for strict filtering Thanks -Vincent >> In any case, I will have to parse the in-line style attribute >> string to >> filter those style directives that are not necessary. The complete >> grammar >> for in-line style attributes seems to be a bit complicated to be hand >> crafted (http://www.w3.org/TR/css-style-attr) although in OpenOffice >> converted documents i have only seen the "key:value;key:value" >> format. What >> should be the correct approach to parse the style attribute string ? >> >> Thank you very much for your ideas. :) >> >> >> [image: Asiri Rathnayake's Facebook >> profile]<http://www.facebook.com/people/Asiri_Rathnayake/534607921> _______________________________________________ devs mailing list [email protected] http://lists.xwiki.org/mailman/listinfo/devs

