Re: [xwiki-devs] Office Importer - Importing into multiple wiki pages

Ludovic Dubost Wed, 11 Mar 2009 01:23:17 -0700

Asiri Rathnayake a écrit :
> Hi devs,
>
> To implement the above functionality I have created the following UI:
> http://i43.tinypic.com/28l7x2u.png which was dervied from the mockups
> located at
> http://incubator.myxwiki.org/xwiki/bin/view/Mockups/ImportCompositeDocument
>
> Descriptions of various fields are as follows:
>
> * Document - The office document to be uploaded (and imported)
>
> * Style filtering - Whether to filter office styles or not
>
> * Heading level to split - If the user wishes to split the imported document
> into multiple wiki pages, he has to select the heading level (h1, h2, h3...
> h6) to be used when splitting the document. If the user does not select a
> heading level, the document will be imported as it is (no splitting).
>
> * Custom split regex - If the user wants to further refine the split
> criterion (based on the content of header) this field allows him to specify
> that criterion through a regular expression.
>
>     Example regular expression: <b>Section<b>.*
>
>     Open Question: Aren't regular expressions bit too technical for users?
>
>   
This is actually not really needed. I've put it to have an extra level 
of flexibility.


> * Target space - This is where the resulting document(s) will land.
>   
> * Target (master) page - The main document holding the TOC (in case of
> splitting), otherwise this is the name of resulting wiki page.
>
> * Child pages naming method - If the document is split into multiple pages,
> pages should be named according to some criterion. This combo box allows
> users to specify that criterion.
>
>   
For the 3 previous settings, we will probably need to allow the system 
to be extensible. It would be great to provide APIs for that either that 
we can implement in Groovy and/or to implement this part fully in script.
It might be better in Groovy. So I would suggest we write an Extension 
interface with a default implementation and a way to provide alternative 
implentations. This would probably mean that each of the implementation 
could take a arbitrary number of parameters. We could decide all these 
parameters are passed as Strings

>
> Regarding the implementation, we have two possible approaches.
>
> 1. Implement the splitting in w3c dom level (xhtml)
> 2. Implement the splitting in XDOM level
>
> * In the first approach we will navigate through the child elements directly
> under <body> tag and find matching heading elements. For the regex, we will
> have to serialize the heading element so that the regex can be evaluated.
> Heading elements can be serialized as explained here:
> http://forums.sun.com/thread.jspa?threadID=698475
>
> * In the second approach we can either use XDOM operations or use a
> SplittingChainingListener. But I don't know whether regex matching is
> possible with this scheme.
>
> Also, regardless of the method we follow, there will be a problem with large
> office documents (say 100MB or so). Loading such a file into memory (dom or
> xdom) would not be a good idea.
>
>   
A streaming parser would probably be best to support very large files. 
So each individual conversion is kept independent and the memory load is 
not more than what is needed to import one section.

Ludovic

> I haven't decided which method to go with yet. So it will be really great if
> we can sort this out as soon as possible.
>
>
>
> Thanks.
>
> - Asiri
> _______________________________________________
> devs mailing list
> [email protected]
> http://lists.xwiki.org/mailman/listinfo/devs
>
>   


-- 
Ludovic Dubost
Blog: http://blog.ludovic.org/
XWiki: http://www.xwiki.com
Skype: ldubost GTalk: ldubost

_______________________________________________
devs mailing list
[email protected]
http://lists.xwiki.org/mailman/listinfo/devs

Re: [xwiki-devs] Office Importer - Importing into multiple wiki pages

Reply via email to