On 1 August 2015 at 17:41, Peter Kelly <pmke...@apache.org> wrote: > Hi Jan, > > I’ll get to your question in a moment, but I just checked out the > newZipExperiment branch and noticed that almost all of the source files > have changed (I was expecting a relatively small diff, with only a few > files changed). It looks like most of these differences are due to > reordering the #includes at the top of each source file. If we’re going to > do this, could we make it a separate commit in master, so it’s easier to > see exactly what has changed in the zip branch? > We are not going to do this, it was me being religious for a moment.
I need a FILE * in the zipHandle structure, and did not like void * > > Actually I normally intentionally put system headers after other headers > in the project, as it helps to detect cases where a custom header depends > on types declared in a system header, and thus for which importing that > header (by itself) in a source file would result in compilation errors due > to the missing references. For example DFBuffer.h has an #include > <stdarg.h> at the type since some of the functions take the va_list data > type. If one of us uses such this type in another header which doesn’t have > #include <stdarg.h>, then any C file that imports it (directly or > indirectly) has to remember to explicitly include stdarg.h (and that could > be a *lot* of files, if the header is referenced from lots of places). So > by placing the any system includes needed by the source file after all > custom headers, we can pick up on these errors more easily. > This is actually how we agreed on it, you will see a newExperiment2 without these many changes. > > Regarding the zip file format, I need to look up on some stuff and will > get back to you shortly. But I suspect some of the duplication may be > related to the fact that a zip file is meant to be read backwards. Rather > than starting at the beginning of the file, reading begins at the end, > working backwards through the file to find potentially multiple copies of > the directory listing. This serves two purposes: > > 1) You can “modify” the contents of a zip file simply by appending (with > the compressed content of new/changed files added, and a new directory > listing including these files, an *not* including any files which have been > “deleted”, i.e. masked out). > > 2) A zip file can be appended to the end of another file format; the most > common example being self-extracting .exe files. Since .exe files are read > from the beginning, the program loader on windows doesn’t care about the > fact that there’s the trailing data at the end. And it’s still a valid zip > file, since the .exe content at the start is ignored when reading the > directory listing. > > I think you may be aware of some of these details already, and there’s > some nuances I’ve probably missed. I’m about to have a look through the > code you currently have in the branch. > Painfully aware. I am slowly including code from an old project of mine, which is soo old that I have forgotten why I did things. I expect to have the open/read part finished in some hours, otherwise it will be delayed to monday. I also have an experimental write (only local, not committed). Thanks for taking a look. I will write in here when I consider the open/read ready for master. I would like to move that to master before I do the write part. rgds jan i. > > — > Dr Peter M. Kelly > pmke...@apache.org > > PGP key: http://www.kellypmk.net/pgp-key <http://www.kellypmk.net/pgp-key> > (fingerprint 5435 6718 59F0 DD1F BFA0 5E46 2523 BAA1 44AE 2966) > > > On 1 Aug 2015, at 4:33 pm, jan i <j...@apache.org> wrote: > > > > Hi > > > > Does anybody know why zip has a mad inefficient directory structure ? > > > > I try to understand the why, but fail. > > > > A zip file, contains 1 global directory with information about every > single > > file (flat structure, no > > sub directories, but filenames may contain a "/"). That is logical and > > expected. > > > > BUT in front of every file, there are a local file header, with filename > > about 3/4 of the information > > from the global directory. This information seems pure redundant and > > unneeded. > > > > What am I missing here ? on one of my test docx, the local headers are > > about 10% of the filesize (looong filenames) which could be thrown away. > > > > Hope somebody can see what I failed to see. > > rgds > > jan i. > >