On 4 September 2011 07:04, Stefan Bodewig <bode...@apache.org> wrote: > Hi, > > I've just committed Converter*Stream implementations for Pack200[1] > which is a bit unusual in several ways. > > First of all it will (by design of the format) only work on compressing > valid jar files. Actually the result isn't likely to be compressed (in > the sense of "smaller than the original") at all but expects another > step of GZip compression in most cases. > > The second difference to the other compressors is that the API provided > by the Java classlib doesn't lend itself to streaming at all. There is > a Packer/Unpacker that expects an InputStream and an OutputStream and > converts from one to the other in a single blocking operation (even > closing the input side when done). > > I have experimented with Piped*Streams as well as Ant/commons-exec-like > stream pumping in order to provide a streaming experience but always ran > into some edge cases where things broke down. I'll give one example > below. > > The current implementation of Pack200CompressorInputStream will > pass the wrapped input and an OutputStream writing to a cache to the > Unpacker synchronously inside the constructor, consuming the input > completely. It will then defer all read-operations to the cache. > > Likewise the Pack200CompressorOutputStream will buffer up all write > operations in a cache and once finish() or close() is called the cache > is converted to an InputStream that is then passed together with the > originally wrapped output to the Packer and written synchronously. > > Caches can be in-memory (using ByteArray*Stream) or temporary files > controlled by a constructor option with in-memory as the default and > temp-files for cases where the archives are expected to be big. > > Because of this design the byte-count methods don't make any sense (do > we count when data is written-to/read-from the cache or while the > (Un)Packer is doing its work?) and haven't been implemented at all. > > The class names StreamMode and StreamSwitcher result from my attempts of > using real streams and should be changed unless anybody else comes up > with a working streaming solution. > > The biggest hurdle for any streaming solution is that there is always > going to be some sort of intermediate buffer. Something picks up data > written to the output stream and makes it available to the input stream > side. Once the buffer is full, nothing can be written unless anybody > reads input in a timely manner. > > In the case of a Pack200CompressorInputStream you don't have any control > over when the user code is going to read the data and whether it is > going to consume all of it at all. For example if the stream is wrapped > in a ZipArchiveInputStream (it represents a JAR, after all), it is never > going to be consumed completely because the archive contains ZIP data at > the end that is ignored by the input stream implementation. > > There are more cases where the Pack/Unpack operation would end up > blocked so I decided to only code the more robust indirect solution for > now.
Sounds good. You must have put a lot of thought into this; it would be useful to record these design decisions and investigations in the code somewhere. e.g. as package Javadoc. > Stefan > > [1] > http://download.oracle.com/javase/1.5.0/docs/api/java/util/jar/Pack200.html > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org > For additional commands, e-mail: dev-h...@commons.apache.org > > --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org