On Nov 19, 2008, at 11:20 AM, Guillaume Lerouge wrote:

> Hi Asiri,
>
> On Wed, Nov 19, 2008 at 10:56 AM, Asiri Rathnayake <
> [EMAIL PROTECTED]> wrote:
>
>> Hi Devs,
>>
>> I'm working on implementing the style filtering functionality of
>> xwiki-office-importer application. But first, i need to make sure  
>> that I'm
>> clear about the policy and the correct approach towards filtering  
>> style
>> information from imported office documents. I would really  
>> appreciate your
>> input on this because I'm not an expert on either html or css :)
>>
>> Ok, I plan to do two types of filtering. One is filtering various
>> attributes
>> of various elements (like removing bgcolor attribute from the <body>
>> element). And the second one is filtering css related stuff. Let's  
>> take one
>> by one.
>>
>> 1. Filtering attributes.
>>
>> This is quite straight-forward but i see two possible approaches.
>>
>> * The first approach is to keep a list of attributes that we allow  
>> when
>> importing documents. We'll scan each and every tag and strip off any
>> unwanted attributes present.
>>
>> * The second approach is to associate each tag with what attributes  
>> we
>> allow
>> for that tag. A list of legal attributes for common tags is  
>> presented here
>> http://www.devx.com/projectcool/Article/19816. Similarly, we'll  
>> have our
>> tag_name->allowed_attributes mapping and filter all other attributes
>> present.
>>
>> I'm currently leaning towards the second option, WDYT ?
>>
>> 2. Filtering css styles.
>>
>> Ok, there are three ways one can associate css with html content.  
>> Let's
>> take
>> one by one.
>>
>> (i) External Style Sheet
>>
>> Well, AFAIK OpenOffice server does not produce this type of output  
>> when
>> converting office documents into html. I mean it doesn't output  
>> html files
>> that refer external css files. So I guess this is something we  
>> don't need
>> to
>> worry about.
>>
>> (ii) Internal Style Sheet
>>
>> This is something we need to worry about. OpenOffice server  
>> produces html
>> pages with content like <head><style type="text/css">....</style></ 
>> head>.
>>
>> Currently we strip off <style> tags completely regardless of the  
>> filtering
>> mode (i.e whether styles are set to be filtered or not <style> tags  
>> get
>> removed). Does this behaviour need to change ?
>>
>> (iii) In-line Styles
>>
>> This is the most common type of styling found (Example : <p  
>> style="....">).
>> Present behaviour is to strip off this style attribute completely (if
>> filterStyles is set to true). The question is, there are some  
>> inline styles
>> that directly maps to xwiki 2.0 syntax like <p style="font- 
>> weight:bold">,
>> what are we going to do about these ?
>
>
> I can't help you much from the technical perspective. Re styles that  
> can be
> directly mapped to XWiki 2.0 syntax, I think they should be  
> converted to use
> that syntax. To summarize my opinion:
>
>   - When strict filtering is activated (conversion to XWiki 2.0  
> syntax)
>   - Only style attributes that can be directly mapped to wiki syntax
>      element should be kept
>      - This means that NO (% ... %) should appear
>
> Is that fine with everyone?

Yes fine with me for strict filtering

Thanks
-Vincent

>> In any case, I will have to parse the in-line style attribute  
>> string to
>> filter those style directives that are not necessary. The complete  
>> grammar
>> for in-line style attributes seems to be a bit complicated to be hand
>> crafted (http://www.w3.org/TR/css-style-attr) although in OpenOffice
>> converted documents i have only seen the "key:value;key:value"  
>> format. What
>> should be the correct approach to parse the style attribute string ?
>>
>> Thank you very much for your ideas. :)
>>
>>
>> [image: Asiri Rathnayake's Facebook
>> profile]<http://www.facebook.com/people/Asiri_Rathnayake/534607921>
_______________________________________________
devs mailing list
[email protected]
http://lists.xwiki.org/mailman/listinfo/devs

Reply via email to