Hi Asiri, On Wed, Nov 19, 2008 at 10:56 AM, Asiri Rathnayake < [EMAIL PROTECTED]> wrote:
> Hi Devs, > > I'm working on implementing the style filtering functionality of > xwiki-office-importer application. But first, i need to make sure that I'm > clear about the policy and the correct approach towards filtering style > information from imported office documents. I would really appreciate your > input on this because I'm not an expert on either html or css :) > > Ok, I plan to do two types of filtering. One is filtering various > attributes > of various elements (like removing bgcolor attribute from the <body> > element). And the second one is filtering css related stuff. Let's take one > by one. > > 1. Filtering attributes. > > This is quite straight-forward but i see two possible approaches. > > * The first approach is to keep a list of attributes that we allow when > importing documents. We'll scan each and every tag and strip off any > unwanted attributes present. > > * The second approach is to associate each tag with what attributes we > allow > for that tag. A list of legal attributes for common tags is presented here > http://www.devx.com/projectcool/Article/19816. Similarly, we'll have our > tag_name->allowed_attributes mapping and filter all other attributes > present. > > I'm currently leaning towards the second option, WDYT ? > > 2. Filtering css styles. > > Ok, there are three ways one can associate css with html content. Let's > take > one by one. > > (i) External Style Sheet > > Well, AFAIK OpenOffice server does not produce this type of output when > converting office documents into html. I mean it doesn't output html files > that refer external css files. So I guess this is something we don't need > to > worry about. > > (ii) Internal Style Sheet > > This is something we need to worry about. OpenOffice server produces html > pages with content like <head><style type="text/css">....</style></head>. > > Currently we strip off <style> tags completely regardless of the filtering > mode (i.e whether styles are set to be filtered or not <style> tags get > removed). Does this behaviour need to change ? > > (iii) In-line Styles > > This is the most common type of styling found (Example : <p style="....">). > Present behaviour is to strip off this style attribute completely (if > filterStyles is set to true). The question is, there are some inline styles > that directly maps to xwiki 2.0 syntax like <p style="font-weight:bold">, > what are we going to do about these ? I can't help you much from the technical perspective. Re styles that can be directly mapped to XWiki 2.0 syntax, I think they should be converted to use that syntax. To summarize my opinion: - When strict filtering is activated (conversion to XWiki 2.0 syntax) - Only style attributes that can be directly mapped to wiki syntax element should be kept - This means that NO (% ... %) should appear Is that fine with everyone? > In any case, I will have to parse the in-line style attribute string to > filter those style directives that are not necessary. The complete grammar > for in-line style attributes seems to be a bit complicated to be hand > crafted (http://www.w3.org/TR/css-style-attr) although in OpenOffice > converted documents i have only seen the "key:value;key:value" format. What > should be the correct approach to parse the style attribute string ? > > Thank you very much for your ideas. :) > > > [image: Asiri Rathnayake's Facebook > profile]<http://www.facebook.com/people/Asiri_Rathnayake/534607921> > _______________________________________________ > devs mailing list > [email protected] > http://lists.xwiki.org/mailman/listinfo/devs > -- Guillaume Lerouge Product Manager - XWiki Skype ID : wikibc http://blog.xwiki.com/ _______________________________________________ devs mailing list [email protected] http://lists.xwiki.org/mailman/listinfo/devs

