On 20.05.2014 23:38, Andrea Pescetti wrote:
On 19/05/2014 Andre Fischer wrote:
As one of the first tasks in the OOXML area I would like to propose to
redesign and re-implement the OOXML parser.

I can only agree with this one. We've already discussed it many times, but even the many users who prefer ODF need a good support for OOXML for interoperability, and better support for the Microsoft Office native formats is consistently in the top requests.

I propose a new and unified approach that will essentially replace the
current design and implementation.

Sounds good. Especially the idea to be able to automatically know how much of the specification is covered will be helpful.

I also propose to focus first on Impress. Its complexity regarding
OOXML is less than that of Writer and Calc

And this is probably good for users too. In my experience, the import of .PPTX files is the most unsatisfactory one at the moment, with many obvious deficiencies. Improving this one first would already give good results for users.

I have made several experiments regarding the reading of the
specification and generation of parsers and am confident that the
outlined approach will work.

A not-so-original question: we have another Apache project, POI, http://poi.apache.org/ that among the other things has an OOXML parser. If we are starting from scratch, why not reusing their code? And, if there are reasons for not reusing it, could we validate this roadmap with the POI developers, who are probably more familiar with OOXML parsing than the average reader of this list?

First, we are not really starting from scratch. There are several components to importing OOXML files. Two important ones are the parser that reads (OO)XML streams and turns them into events for start tags, end tags, text, etc. The second part are the callbacks that are called for each of these events. This second part is the larger and more important part. I want to replace the parser but would like to migrate as much as possible of the second part callbacks as possible.

Most of the work in the OOXML import/export project, however, will be spent in other areas:

- Implementing features that exist in MS Office but not in OpenOffice. Examples are SmartArt shapes (for all applications).

- Improve features in OpenOffice that are not working as well as they should/could. Examples are pivot tables in Calc or the slide show in Impress.

- Support existing features in OpenOffice that are just not handled by the OOXML importer.


Regarding POI, there are several reasons not to use it:

- As said above, the existing import code is to be migrated to the new framework. The new framework should offer an interface that supports this migration.

- POI is implemented in Java.

- As far as I understand POI (I don't find its documentation very helpful) is more like a DOM tree with better access to its nodes then a streaming parser. That would result in lower execution speed and larger memory consumption.

- OOXML / MS Office is supported up to 2007. That seems like an undesirable restriction.

- The original naming (see http://en.wikipedia.org/wiki/Apache_POI) does not imply professional development of the POI project.


Regards,
Andre



Regards,
  Andrea.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@openoffice.apache.org
For additional commands, e-mail: dev-h...@openoffice.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@openoffice.apache.org
For additional commands, e-mail: dev-h...@openoffice.apache.org

Reply via email to