Hi

while looking into generalizing COMPRESS-207 I realized our
CompressorOutputStreams didn't provide bytesWritten - unlike the
ArchiverOutputStreams - while the InputStreams all provide a bytesRead.

And I also realized I didn't really know what bytesRead actually meant -
bytes read from the compressed stream or uncompressed bytes read from
*this* stream. A quick look into the implementation shows I'm not the
only one who is confused.

For the CompressorInputStream implementations it is the number
uncompressed bytes. For the ArchiveInputStreams it is the number of
bytes read from the underlying stream. For ArchiveOutputStream the
picture is not as clear, zip seems to count the number of bytes written to
the underlying stream while ar (which doesn't compress anything) does
not count the extra archive header it writes, for example.

I'm not really sure who uses the counts, but any attempt to make the
counts consistent is bound to break their expectations. Should we just
document the current state - and look into making ArchiverOutputStreams
consistent? What would be the "correct" choice for
CompressorOutputStreams? I'd probably prefer uncompressed bytes to
mirror CompressorInputStream.

This also raises the question of what to do with them in compress2
(unless that's a dead end). The current compress2 branch doesn't contain
any counts at all. I'd probably prefer to go with a notifier approach
like COMPRESS-207 suggests and drop the getBytesRead/Written methods
altogether. Whoever wants that information would subscribe to the
notifications that then would contain both byte counts.

Stefan

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Reply via email to