[jira] [Resolved] (COMPRESS-403) Block and Record Size issues in TarArchiveOutputStream

Stefan Bodewig (JIRA) Sun, 08 Oct 2017 09:59:21 -0700

     [ 
https://issues.apache.org/jira/browse/COMPRESS-403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Stefan Bodewig resolved COMPRESS-403.
-------------------------------------
       Resolution: Fixed
    Fix Version/s: 1.15

all subtasks are resolved by now

> Block and Record Size issues in  TarArchiveOutputStream 
> --------------------------------------------------------
>
>                 Key: COMPRESS-403
>                 URL: https://issues.apache.org/jira/browse/COMPRESS-403
>             Project: Commons Compress
>          Issue Type: Improvement
>          Components: Archivers
>    Affects Versions: 1.14
>            Reporter: Simon Spero
>            Priority: Minor
>             Fix For: 1.15
>
>
> According to the pax spec 
>  [§4.100.13.01| 
> http://pubs.opengroup.org/onlinepubs/009695399/utilities/pax.html#tag_04_100_13_01]
>  
> bq. A pax archive tape or file produced in the -x pax format shall contain a 
> series of blocks. The physical layout of the archive shall be identical to 
> the ustar format
> [§ 4.100.13.06| 
> http://pubs.opengroup.org/onlinepubs/009695399/utilities/pax.html#tag_04_100_13_06]
>  
> bq. A ustar archive tape or file shall contain a series of logical records. 
> Each logical record shall be a fixed-size logical record of 512 octets.
> ...
> bq. The logical records *may* be grouped for physical I/O operations, as 
> described under the -b blocksize and -x ustar options. Each group of logical 
> records *may* be written with a single operation equivalent to the write() 
> function. On magnetic tape, the result of this write *shall* be a single tape 
> physical block. The last physical block *shall* always be the full size, so 
> logical records after the two zero logical records *may* contain undefined 
> data.
> bq. pax. The default blocksize for this format for character special archive 
> files *shall* be 5120. Implementations *shall* support all blocksize values 
> less than or equal to 32256 that are multiples of 512.
> bq. ustar. The default blocksize for this format for character special 
> archive files *shall* be 10240. Implementations *shall* support all blocksize 
> values less than or equal to 32256 that are multiples of 512.
> bq. Implementations are permitted to modify the block-size value based on the 
> archive format or the device to which the archive is being written. This is 
> to provide implementations with the opportunity to take advantage of special 
> types of devices, and it should not be used without a great deal of 
> consideration as it almost certainly decreases archive portability.
> The current implementation of TarArchiveOutputStream
> # Allows the logical record size to be altered
> # Has a default block size of 10240  
> # has two separate logical-record size buffers, and frequently double buffers 
> in order to write to the wrapped outputstream in units of a logical record, 
> rather than a physical block.
> I would hazard a guess that very few users commons-compress are writing 
> directly to a tape drive, where the block-size is of great import.  It is 
> also not possible to guarantee that a subordinate output stream won't buffer 
> in chunks  of a different size (5120 and 10240 bytes aren't ideal for modern 
> hard drives with 4096 byte sectors, or filesystems like ZFS with a default 
> recordsize of 128K).  
> The main effect of the record and block size have is the extra padding they 
> require. For the purposes of the java output  device, the optimal blocksize  
> to modify to is probably just a single record; since all implementations must 
> handle 512 byte blocks, and must detect block size on input (or simulate 
> same), this cannot affect compatibility. 
> Fixed length blocking in multiples of 512 Bytes can be supported by wrapping 
> the destination output stream in a modified BufferedOutputStream that does 
> not permit flushing of partial blocks, and pads on close. This would only be 
> used as necessary. 
>  
> If a record size of 512 bytes is being used, it could be useful to store that 
> information in an extended header at the start of the file. That allows for 
> in-place appending to an archive without having to read the entire archive 
> first (as long as the original end-of-archive location is journaled to 
> support recovery). 
> There is even an advantage for xz compressed files, as every block but the 
> last can be copied without having to decompress then recompress, 
> In the latter scenario, it would be useful to be able to signal to the 
> subordinate layer to start a new block before  writing the final 1024 nulls; 
> in that situation, either a new block can be started overwriting the EOA and 
> xz index blocks, with the saved index info saved at the end; or the block 
> immediately preceding the EOA markers can be decompressed and recompressed, 
> which will rebuild the dictionary and index structures to allow the block to 
> be continued. That's a different issue.  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Resolved] (COMPRESS-403) Block and Record Size issues in TarArchiveOutputStream

Reply via email to