On 20.05.2014 23:38, Andrea Pescetti wrote:
On 19/05/2014 Andre Fischer wrote:
As one of the first tasks in the OOXML area I would like to propose to
redesign and re-implement the OOXML parser.
I can only agree with this one. We've already discussed it many times,
but even the many users who prefer ODF need a good support for OOXML
for interoperability, and better support for the Microsoft Office
native formats is consistently in the top requests.
I propose a new and unified approach that will essentially replace the
current design and implementation.
Sounds good. Especially the idea to be able to automatically know how
much of the specification is covered will be helpful.
I also propose to focus first on Impress. Its complexity regarding
OOXML is less than that of Writer and Calc
And this is probably good for users too. In my experience, the import
of .PPTX files is the most unsatisfactory one at the moment, with many
obvious deficiencies. Improving this one first would already give good
results for users.
I have made several experiments regarding the reading of the
specification and generation of parsers and am confident that the
outlined approach will work.
A not-so-original question: we have another Apache project, POI,
http://poi.apache.org/ that among the other things has an OOXML
parser. If we are starting from scratch, why not reusing their code?
And, if there are reasons for not reusing it, could we validate this
roadmap with the POI developers, who are probably more familiar with
OOXML parsing than the average reader of this list?
First, we are not really starting from scratch. There are several
components to importing OOXML files. Two important ones are the parser
that reads (OO)XML streams and turns them into events for start tags,
end tags, text, etc. The second part are the callbacks that are called
for each of these events. This second part is the larger and more
important part. I want to replace the parser but would like to migrate
as much as possible of the second part callbacks as possible.
Most of the work in the OOXML import/export project, however, will be
spent in other areas:
- Implementing features that exist in MS Office but not in OpenOffice.
Examples are SmartArt shapes (for all applications).
- Improve features in OpenOffice that are not working as well as they
should/could. Examples are pivot tables in Calc or the slide show in
Impress.
- Support existing features in OpenOffice that are just not handled by
the OOXML importer.
Regarding POI, there are several reasons not to use it:
- As said above, the existing import code is to be migrated to the new
framework. The new framework should offer an interface that supports
this migration.
- POI is implemented in Java.
- As far as I understand POI (I don't find its documentation very
helpful) is more like a DOM tree with better access to its nodes then a
streaming parser. That would result in lower execution speed and larger
memory consumption.
- OOXML / MS Office is supported up to 2007. That seems like an
undesirable restriction.
- The original naming (see http://en.wikipedia.org/wiki/Apache_POI) does
not imply professional development of the POI project.
Regards,
Andre
Regards,
Andrea.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@openoffice.apache.org
For additional commands, e-mail: dev-h...@openoffice.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@openoffice.apache.org
For additional commands, e-mail: dev-h...@openoffice.apache.org