[ 
https://issues.apache.org/jira/browse/COR-18?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jan iversen updated COR-18:
---------------------------
    Component/s: DocFormats - platform

> Replacing MiniZip
> -----------------
>
>                 Key: COR-18
>                 URL: https://issues.apache.org/jira/browse/COR-18
>             Project: Corinthia
>          Issue Type: New Feature
>          Components: DocFormats - platform
>         Environment: source
>            Reporter: jan iversen
>            Assignee: jan iversen
>            Priority: Blocker
>             Fix For: 0.5
>
>
> MiniZip is a bit thin and, because of some changes needed, it might be better 
> to replace it in the DocFormats/3rdparty/external/ folder, as @peterkelly 
> observes at #26 (comment)
> Easy Steps
> For now, it might be desirable to simply replace the current code with 
> MiniZip 1.1 from http://www.winimage.com/zLibDll/minizip.html
> Since it is a simple dependency, this should work fine so long as there are 
> no breaking API changes in between 1.0h and 1.1.
> Eventually?
> It would be good to have something behind a stable API that permits random 
> access for reading file streams as Peter suggests. Ideally, that API would be 
> aligned around the Document Container File (DCF) profile of the official 
> PKWare specification that is used commonly among ePub, ODF, and the Open 
> Packaging Conventions (OPC) used in OOXML and elsewhere. I don't know what 
> the latest status of that profile is at ISO/IEC JTC1 SC34, but it will become 
> a common international specification for these specialized usage of Zip as a 
> compound document-format container file.
> There are other places to look for ideas and possible sources of reusable 
> code and API considerations, including in Apache OpenOffice, the Apache ODF 
> Toolkit (using Java). , and the Microsoft open-sourcing of its OOXML-access 
> layer (in .NET I think). And the Microsoft platform has some native support 
> that it might be useful to be able to rely on in Windows-targeted builds.
> There is also a CodePlex LibOPC project that is C code under a BSD-form 
> license at https://libopc.codeplex.com/ One interesting feature of LibOPC 
> that may interest Apache OpenOffice folk (i.e., @janiversen) is a python 
> script for generating Visual Studio projects that can be used for 
> manipulating and building on Windows.
> One caveat. For ingesting Zip-based document files, there needs to be a fair 
> amount of code to ensure resiliency and defense against DOS-ing of 
> applications with malformed document files. That may have to be grown, with 
> attention to the code footprint on limited-capacity devices (where presumably 
> some of the heavy-lifting is off-loaded to the cloud). It is an interesting 
> feature of the OPC specification is that it is also designed to support 
> remoting of the document streams in a way where there is no requirement that 
> a Zip file be transferred to the client. That may be very much eventually, 
> but it is useful to think about having an API that would allow for that 
> underneath.
> Lest we forget?
> Although this is all .NET-fu, there may be useful ideas on this project,
> https://github.com/OfficeDev/Open-Xml-Sdk
> as a source of ideas (and some of the system-level dependencies may have 
> Native Windows counterparts as well). This might be useful for mining for 
> other ideas higher up in the API modeling too.
> ---
> I didn't think to mention POI and whatever they use as a model close to the 
> Zip packages.
> I didn't realize until looking at the proposal to become an Apache incubator 
> project that the sources for minizip and tidy-html5 are not pristine. It 
> would be good to reconstruct the modification process and leave more 
> footprints if the changes are not in the repository here. (Actually, it would 
> be good to reconstruct the modification anyhow, but diffs from git would be 
> helpful.)
> I'm thinking that there is no hurry to replace these in early stages. If a 
> better API is desired, the first step of getting that in place would be to 
> build a shim that goes from that API to anything hand at first, such as 
> minizip or some other library, and worry about fit and performance later.
> jan: 
> POI is in java, so they have other packages available.
> I am currently working on expanding the platform part to also include zip and 
> html, so that we can change the libraries at a later stage. I think your idea 
> of using libOPC is valid and interesting...you, peter and svante knows better 
> if it fits to the project.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to