On Fri, Oct 16, 2020 at 2:05 PM Dave Fisher <wave4d...@comcast.net> wrote:
> Hi - > > Sent from my iPhone > > > On Oct 16, 2020, at 4:04 AM, Mechtilde <o...@mechtilde.de> wrote: > > > > Hello Joost, > > > > I'm very happy to read from you. > > > >> Am 16.10.20 um 12:50 schrieb Joost Andrae: > >> Hi Simon, > >> > >> it's an honor to me to see a sign of life of you here. Welcome ! > >> > >> Instead of user picking here to get users leave from AOO to LO a > >> developer could create a Java based OOo/LO extension that uses Apache > >> POI to export OpenDocument type documents to MSXML formats by using the > >> binary MSO export to export those documents to the MSXML format in > >> between. Or maybe it's possible to XSL this document format by using > >> OpenOffice together with Apache POI. Using XSL scripts (in AOO menu item > >> XML filter settings) to make document conversions is possible within > OOo. > > > > I offer my help to test the implementation. sorry but I'm not a > > programmer. So we as the project need help from Java programmers to work > > on it and contribute it. > > I’m a PMC Member of Apache POI for over 12 years. My team donated the > initial PowerPoint support and were involved in the initial support for > OOXML. > > POI is embedded into Apache SOLr and Tika along with commercial products. > The project took over the dormant XMLBeans project and is releasing a 4.0 > that supports modern Java. > > An OSGi bundle of POI will be available in the next release if you build > from source. > > The Tika, POI, and PDFBox projects maintain a large regression corpus > scraped from the internet using CommonCrawl. I’m sure that this could be > shared in one way or another. > > Regards, > Dave > > Hi I did start writing a POI-based OOXML export filter for AOO some years ago (search the dev mailing list), and got it to the point of being able to save very basic spreadsheets (no formulas, no formatting, just text and numbers). There were several major problems with using POI. Firstly the code in POI is at various stages of completeness. The legacy XLS filter is very good, supports SAX parsing, etc. The DOC filter is minimal and unmaintained. What we would need, the OOXML filter for at least XLSX, is somewhere in between. AFAIK it only supports DOM parsing, meaning everything needs to be in memory before it can be written to disk, so a big spreadsheet could consume gigabytes of RAM during saving, and if you don't have enough memory free, you can't save! Also I do use POI at work, and it's outstanding for parsing spreadsheets (it can even parse some that AOO can't), but it's very memory hungry. A spreadsheet with 100000 rows consumed 6 GB of RAM, compared to 200 MB in LO (30 times less). That isn't really POI's fault, Java has too much per-object overhead and there are a great many objects in a spreadsheet that big. So DOM + Java really do not add up to efficient memory usage. By comparison, our current OOXML reading is not only SAX-based, but converts XML tags to integers for faster comparisons and lower memory usage. Finally AOO itself had limitations that made developing a filter in Java difficult. Each sheet in a spreadsheet has 1 billion cells. Obviously only a minority of these contain data - most are empty. In C++ there are special iterators that can be used to access only the non-empty cells, but these are not exposed to UNO, or through it, to Java. The only way to tell which cells are in use is to iterate over all 1 billion cells (per sheet), which is hopelessly slow. Some of these problems can be solved. We can expose the cell iterators over UNO. The memory usage might not matter that much in practice, and we could patch POI to do SAX parsing/saving at a later stage. But users expect fonts, styles, charts, images, custom formats, OLE, pivot tables, VBA macros, form controls, mathematical formulas, change tracking, etc. all saved losslessly and 100% compatible with Excel, which doesn't only require work in the filter, but in the rest of AOO too, and POI probably doesn't support all of those features either. I might get back into this next month, especially if others want to collaborate, but don't expect something generally usable, let alone Excel-quality XSLX saving, any time soon. Regards Damjan