[
https://issues.apache.org/jira/browse/COMPRESS-403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Stefan Bodewig resolved COMPRESS-403.
-------------------------------------
Resolution: Fixed
Fix Version/s: 1.15
all subtasks are resolved by now
> Block and Record Size issues in TarArchiveOutputStream
> --------------------------------------------------------
>
> Key: COMPRESS-403
> URL: https://issues.apache.org/jira/browse/COMPRESS-403
> Project: Commons Compress
> Issue Type: Improvement
> Components: Archivers
> Affects Versions: 1.14
> Reporter: Simon Spero
> Priority: Minor
> Fix For: 1.15
>
>
> According to the pax spec
> [§4.100.13.01|
> http://pubs.opengroup.org/onlinepubs/009695399/utilities/pax.html#tag_04_100_13_01]
>
> bq. A pax archive tape or file produced in the -x pax format shall contain a
> series of blocks. The physical layout of the archive shall be identical to
> the ustar format
> [§ 4.100.13.06|
> http://pubs.opengroup.org/onlinepubs/009695399/utilities/pax.html#tag_04_100_13_06]
>
> bq. A ustar archive tape or file shall contain a series of logical records.
> Each logical record shall be a fixed-size logical record of 512 octets.
> ...
> bq. The logical records *may* be grouped for physical I/O operations, as
> described under the -b blocksize and -x ustar options. Each group of logical
> records *may* be written with a single operation equivalent to the write()
> function. On magnetic tape, the result of this write *shall* be a single tape
> physical block. The last physical block *shall* always be the full size, so
> logical records after the two zero logical records *may* contain undefined
> data.
> bq. pax. The default blocksize for this format for character special archive
> files *shall* be 5120. Implementations *shall* support all blocksize values
> less than or equal to 32256 that are multiples of 512.
> bq. ustar. The default blocksize for this format for character special
> archive files *shall* be 10240. Implementations *shall* support all blocksize
> values less than or equal to 32256 that are multiples of 512.
> bq. Implementations are permitted to modify the block-size value based on the
> archive format or the device to which the archive is being written. This is
> to provide implementations with the opportunity to take advantage of special
> types of devices, and it should not be used without a great deal of
> consideration as it almost certainly decreases archive portability.
> The current implementation of TarArchiveOutputStream
> # Allows the logical record size to be altered
> # Has a default block size of 10240
> # has two separate logical-record size buffers, and frequently double buffers
> in order to write to the wrapped outputstream in units of a logical record,
> rather than a physical block.
> I would hazard a guess that very few users commons-compress are writing
> directly to a tape drive, where the block-size is of great import. It is
> also not possible to guarantee that a subordinate output stream won't buffer
> in chunks of a different size (5120 and 10240 bytes aren't ideal for modern
> hard drives with 4096 byte sectors, or filesystems like ZFS with a default
> recordsize of 128K).
> The main effect of the record and block size have is the extra padding they
> require. For the purposes of the java output device, the optimal blocksize
> to modify to is probably just a single record; since all implementations must
> handle 512 byte blocks, and must detect block size on input (or simulate
> same), this cannot affect compatibility.
> Fixed length blocking in multiples of 512 Bytes can be supported by wrapping
> the destination output stream in a modified BufferedOutputStream that does
> not permit flushing of partial blocks, and pads on close. This would only be
> used as necessary.
>
> If a record size of 512 bytes is being used, it could be useful to store that
> information in an extended header at the start of the file. That allows for
> in-place appending to an archive without having to read the entire archive
> first (as long as the original end-of-archive location is journaled to
> support recovery).
> There is even an advantage for xz compressed files, as every block but the
> last can be copied without having to decompress then recompress,
> In the latter scenario, it would be useful to be able to signal to the
> subordinate layer to start a new block before writing the final 1024 nulls;
> in that situation, either a new block can be started overwriting the EOA and
> xz index blocks, with the saved index info saved at the end; or the block
> immediately preceding the EOA markers can be decompressed and recompressed,
> which will rebuild the dictionary and index structures to allow the block to
> be continued. That's a different issue.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)