Hi Asiri,

On Wed, Nov 19, 2008 at 10:56 AM, Asiri Rathnayake <
[EMAIL PROTECTED]> wrote:

> Hi Devs,
>
> I'm working on implementing the style filtering functionality of
> xwiki-office-importer application. But first, i need to make sure that I'm
> clear about the policy and the correct approach towards filtering style
> information from imported office documents. I would really appreciate your
> input on this because I'm not an expert on either html or css :)
>
> Ok, I plan to do two types of filtering. One is filtering various
> attributes
> of various elements (like removing bgcolor attribute from the <body>
> element). And the second one is filtering css related stuff. Let's take one
> by one.
>
> 1. Filtering attributes.
>
> This is quite straight-forward but i see two possible approaches.
>
> * The first approach is to keep a list of attributes that we allow when
> importing documents. We'll scan each and every tag and strip off any
> unwanted attributes present.
>
> * The second approach is to associate each tag with what attributes we
> allow
> for that tag. A list of legal attributes for common tags is presented here
> http://www.devx.com/projectcool/Article/19816. Similarly, we'll have our
> tag_name->allowed_attributes mapping and filter all other attributes
> present.
>
> I'm currently leaning towards the second option, WDYT ?
>
> 2. Filtering css styles.
>
> Ok, there are three ways one can associate css with html content. Let's
> take
> one by one.
>
> (i) External Style Sheet
>
> Well, AFAIK OpenOffice server does not produce this type of output when
> converting office documents into html. I mean it doesn't output html files
> that refer external css files. So I guess this is something we don't need
> to
> worry about.
>
> (ii) Internal Style Sheet
>
> This is something we need to worry about. OpenOffice server produces html
> pages with content like <head><style type="text/css">....</style></head>.
>
> Currently we strip off <style> tags completely regardless of the filtering
> mode (i.e whether styles are set to be filtered or not <style> tags get
> removed). Does this behaviour need to change ?
>
> (iii) In-line Styles
>
> This is the most common type of styling found (Example : <p style="....">).
> Present behaviour is to strip off this style attribute completely (if
> filterStyles is set to true). The question is, there are some inline styles
> that directly maps to xwiki 2.0 syntax like <p style="font-weight:bold">,
> what are we going to do about these ?


I can't help you much from the technical perspective. Re styles that can be
directly mapped to XWiki 2.0 syntax, I think they should be converted to use
that syntax. To summarize my opinion:

   - When strict filtering is activated (conversion to XWiki 2.0 syntax)
   - Only style attributes that can be directly mapped to wiki syntax
      element should be kept
      - This means that NO (% ... %) should appear

Is that fine with everyone?


> In any case, I will have to parse the in-line style attribute string to
> filter those style directives that are not necessary. The complete grammar
> for in-line style attributes seems to be a bit complicated to be hand
> crafted (http://www.w3.org/TR/css-style-attr) although in OpenOffice
> converted documents i have only seen the "key:value;key:value" format. What
> should be the correct approach to parse the style attribute string ?
>
> Thank you very much for your ideas. :)
>
>
> [image: Asiri Rathnayake's Facebook
> profile]<http://www.facebook.com/people/Asiri_Rathnayake/534607921>
> _______________________________________________
> devs mailing list
> [email protected]
> http://lists.xwiki.org/mailman/listinfo/devs
>



-- 
Guillaume Lerouge
Product Manager - XWiki
Skype ID : wikibc
http://blog.xwiki.com/
_______________________________________________
devs mailing list
[email protected]
http://lists.xwiki.org/mailman/listinfo/devs

Reply via email to