Re: [PROPOSAL] New OOXML import framework

Andre Fischer Wed, 21 May 2014 00:01:01 -0700

On 20.05.2014 23:38, Andrea Pescetti wrote:

On 19/05/2014 Andre Fischer wrote:
As one of the first tasks in the OOXML area I would like to propose to
redesign and re-implement the OOXML parser.
I can only agree with this one. We've already discussed it many times,but even the many users who prefer ODF need a good support for OOXMLfor interoperability, and better support for the Microsoft Officenative formats is consistently in the top requests.
I propose a new and unified approach that will essentially replace the
current design and implementation.
Sounds good. Especially the idea to be able to automatically know howmuch of the specification is covered will be helpful.
I also propose to focus first on Impress. Its complexity regarding
OOXML is less than that of Writer and Calc
And this is probably good for users too. In my experience, the importof .PPTX files is the most unsatisfactory one at the moment, with manyobvious deficiencies. Improving this one first would already give goodresults for users.
I have made several experiments regarding the reading of the
specification and generation of parsers and am confident that the
outlined approach will work.
A not-so-original question: we have another Apache project, POI,http://poi.apache.org/ that among the other things has an OOXMLparser. If we are starting from scratch, why not reusing their code?And, if there are reasons for not reusing it, could we validate thisroadmap with the POI developers, who are probably more familiar withOOXML parsing than the average reader of this list?

First, we are not really starting from scratch. There are severalcomponents to importing OOXML files. Two important ones are the parserthat reads (OO)XML streams and turns them into events for start tags,end tags, text, etc. The second part are the callbacks that are calledfor each of these events. This second part is the larger and moreimportant part. I want to replace the parser but would like to migrateas much as possible of the second part callbacks as possible.

Most of the work in the OOXML import/export project, however, will bespent in other areas:

- Implementing features that exist in MS Office but not in OpenOffice.Examples are SmartArt shapes (for all applications).

- Improve features in OpenOffice that are not working as well as theyshould/could. Examples are pivot tables in Calc or the slide show inImpress.

- Support existing features in OpenOffice that are just not handled bythe OOXML importer.



Regarding POI, there are several reasons not to use it:

- As said above, the existing import code is to be migrated to the newframework. The new framework should offer an interface that supportsthis migration.


- POI is implemented in Java.

- As far as I understand POI (I don't find its documentation veryhelpful) is more like a DOM tree with better access to its nodes then astreaming parser. That would result in lower execution speed and largermemory consumption.

- OOXML / MS Office is supported up to 2007. That seems like anundesirable restriction.

- The original naming (see http://en.wikipedia.org/wiki/Apache_POI) doesnot imply professional development of the POI project.



Regards,
Andre


Regards,
  Andrea.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@openoffice.apache.org
For additional commands, e-mail: dev-h...@openoffice.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@openoffice.apache.org
For additional commands, e-mail: dev-h...@openoffice.apache.org

Re: [PROPOSAL] New OOXML import framework

Reply via email to