On 4 September 2011 07:04, Stefan Bodewig <bode...@apache.org> wrote:
> Hi,
>
> I've just committed Converter*Stream implementations for Pack200[1]
> which is a bit unusual in several ways.
>
> First of all it will (by design of the format) only work on compressing
> valid jar files.  Actually the result isn't likely to be compressed (in
> the sense of "smaller than the original") at all but expects another
> step of GZip compression in most cases.
>
> The second difference to the other compressors is that the API provided
> by the Java classlib doesn't lend itself to streaming at all.  There is
> a Packer/Unpacker that expects an InputStream and an OutputStream and
> converts from one to the other in a single blocking operation (even
> closing the input side when done).
>
> I have experimented with Piped*Streams as well as Ant/commons-exec-like
> stream pumping in order to provide a streaming experience but always ran
> into some edge cases where things broke down.  I'll give one example
> below.
>
> The current implementation of Pack200CompressorInputStream will
> pass the wrapped input and an OutputStream writing to a cache to the
> Unpacker synchronously inside the constructor, consuming the input
> completely.  It will then defer all read-operations to the cache.
>
> Likewise the Pack200CompressorOutputStream will buffer up all write
> operations in a cache and once finish() or close() is called the cache
> is converted to an InputStream that is then passed together with the
> originally wrapped output to the Packer and written synchronously.
>
> Caches can be in-memory (using ByteArray*Stream) or temporary files
> controlled by a constructor option with in-memory as the default and
> temp-files for cases where the archives are expected to be big.
>
> Because of this design the byte-count methods don't make any sense (do
> we count when data is written-to/read-from the cache or while the
> (Un)Packer is doing its work?) and haven't been implemented at all.
>
> The class names StreamMode and StreamSwitcher result from my attempts of
> using real streams and should be changed unless anybody else comes up
> with a working streaming solution.
>
> The biggest hurdle for any streaming solution is that there is always
> going to be some sort of intermediate buffer.  Something picks up data
> written to the output stream and makes it available to the input stream
> side.  Once the buffer is full, nothing can be written unless anybody
> reads input in a timely manner.
>
> In the case of a Pack200CompressorInputStream you don't have any control
> over when the user code is going to read the data and whether it is
> going to consume all of it at all.  For example if the stream is wrapped
> in a ZipArchiveInputStream (it represents a JAR, after all), it is never
> going to be consumed completely because the archive contains ZIP data at
> the end that is ignored by the input stream implementation.
>
> There are more cases where the Pack/Unpack operation would end up
> blocked so I decided to only code the more robust indirect solution for
> now.

Sounds good.

You must have put a lot of thought into this; it would be useful to
record these design decisions and investigations in the code
somewhere.  e.g. as package Javadoc.

> Stefan
>
> [1] 
> http://download.oracle.com/javase/1.5.0/docs/api/java/util/jar/Pack200.html
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
> For additional commands, e-mail: dev-h...@commons.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Reply via email to