RE: DFStorage

Dennis E. Hamilton Thu, 01 Jan 2015 13:09:19 -0800

I have a question that may just be one of nomenclature, ...

 -- replying below to --
From: Peter Kelly [mailto:[email protected]] 
Sent: Thursday, January 1, 2015 02:58
To: [email protected]
Subject: DFStorage


I realise that I haven’t done a very good job of documenting the code in 
Corinthia, as you’ve probably noticed :) I’ve been meaning to get around to 
this for a while now.

[ ... ]

Now with the current implementation, which is a very simplistic one, it simply 
reads the whole zip file into memory. This is largely due to a limitation in 
the minizip API, which enforces sequential access to the entries in a file. It 
would be conceivable to have the zip DFStorage implementation first read a 
directory listing, and then for each file that’s requested, do a linear scan 
through all the entries before finding the requested file, and then reading 
that. This would be an O(n) operation, but would be unlikely to be a major 
problem since most zip packages we’re dealing with will only have a fairly 
small number of entries.

Minizip does not provide any way to cache the location in the zip file of a 
particular entry, even though this information would be possible to obtain in 
theory (just not through minizip’s AP). If I were writing a zip implementation 
from scratch (and maybe this is something we could consider), I would have it 
read a list of all entries and remember their locations in a hash table, so 
that when a particular named entry is requested, we can go directly to that 
point in the file without having to do a linear scan.

[ ... ]

<orcmid>
   @Peter, I want to verify that we have the same understanding of the Zip file.

   The Zip file itself has a global directory to all of the component files at 
the end of the file.  The global directory provides offsets to where each 
component file begins in the Zip stream and also provides other pertinent 
information.

   To produce a Zip file, minizip would need to remember all of this to append 
to the stream once all of the part files are written out.

The global directory could certainly be cached and, if necessary, indexed from 
a hash table on the names of the component parts.  

   Without looking at minizip, I would assume that there has to be some 
internal representation of the global directory even if it is not exposed.  
Would it be useful to exploit that somehow in elevating a better API?

   So long as the Zip stream can be read via random access, it is normal to 
access the global directory first and then access the parts based on the global 
directory, even if access is in sequential order of those parts in the stream.  
That helps detect apparent corruption of the Zip and it is essential when the 
header for a component file does not specify the length of the file data.

   Does this square with your understanding of what is involved in minizip 
operation?
</orcmid>

RE: DFStorage

Reply via email to