http://git-wip-us.apache.org/repos/asf/commons-compress/blob/a433f625/doc/APPNOTE-6.3.4.TXT ---------------------------------------------------------------------- diff --git a/doc/APPNOTE-6.3.4.TXT b/doc/APPNOTE-6.3.4.TXT new file mode 100644 index 0000000..8cf4bb7 --- /dev/null +++ b/doc/APPNOTE-6.3.4.TXT @@ -0,0 +1,3498 @@ +File: APPNOTE.TXT - .ZIP File Format Specification +Version: 6.3.4 +Status: Final - replaces version 6.3.3 +Revised: October 1, 2014 +Copyright (c) 1989 - 2014 PKWARE Inc., All Rights Reserved. + +1.0 Introduction +--------------- + +1.1 Purpose +----------- + + 1.1.1 This specification is intended to define a cross-platform, + interoperable file storage and transfer format. Since its + first publication in 1989, PKWARE, Inc. ("PKWARE") has remained + committed to ensuring the interoperability of the .ZIP file + format through periodic publication and maintenance of this + specification. We trust that all .ZIP compatible vendors and + application developers that use and benefit from this format + will share and support this commitment to interoperability. + +1.2 Scope +--------- + + 1.2.1 ZIP is one of the most widely used compressed file formats. It is + universally used to aggregate, compress, and encrypt files into a single + interoperable container. No specific use or application need is + defined by this format and no specific implementation guidance is + provided. This document provides details on the storage format for + creating ZIP files. Information is provided on the records and + fields that describe what a ZIP file is. + +1.3 Trademarks +-------------- + + 1.3.1 PKWARE, PKZIP, SecureZIP, and PKSFX are registered trademarks of + PKWARE, Inc. in the United States and elsewhere. PKPatchMaker, + Deflate64, and ZIP64 are trademarks of PKWARE, Inc. Other marks + referenced within this document appear for identification + purposes only and are the property of their respective owners. + + +1.4 Permitted Use +----------------- + + 1.4.1 This document, "APPNOTE.TXT - .ZIP File Format Specification" is the + exclusive property of PKWARE. Use of the information contained in this + document is permitted solely for the purpose of creating products, + programs and processes that read and write files in the ZIP format + subject to the terms and conditions herein. + + 1.4.2 Use of the content of this document within other publications is + permitted only through reference to this document. Any reproduction + or distribution of this document in whole or in part without prior + written permission from PKWARE is strictly prohibited. + + 1.4.3 Certain technological components provided in this document are the + patented proprietary technology of PKWARE and as such require a + separate, executed license agreement from PKWARE. Applicable + components are marked with the following, or similar, statement: + 'Refer to the section in this document entitled "Incorporating + PKWARE Proprietary Technology into Your Product" for more information'. + +1.5 Contacting PKWARE +--------------------- + + 1.5.1 If you have questions on this format, its use, or licensing, or if you + wish to report defects, request changes or additions, please contact: + + PKWARE, Inc. + 201 E. Pittsburgh Avenue, Suite 400 + Milwaukee, WI 53204 + +1-414-289-9788 + +1-414-289-9789 FAX + [email protected] + + 1.5.2 Information about this format and copies of this document are publicly + available at: + + http://www.pkware.com/appnote + +1.6 Disclaimer +-------------- + + 1.6.1 Although PKWARE will attempt to supply current and accurate + information relating to its file formats, algorithms, and the + subject programs, the possibility of error or omission cannot + be eliminated. PKWARE therefore expressly disclaims any warranty + that the information contained in the associated materials relating + to the subject programs and/or the format of the files created or + accessed by the subject programs and/or the algorithms used by + the subject programs, or any other matter, is current, correct or + accurate as delivered. Any risk of damage due to any possible + inaccurate information is assumed by the user of the information. + Furthermore, the information relating to the subject programs + and/or the file formats created or accessed by the subject + programs and/or the algorithms used by the subject programs is + subject to change without notice. + +2.0 Revisions +-------------- + +2.1 Document Status +-------------------- + + 2.1.1 If the STATUS of this file is marked as DRAFT, the content + defines proposed revisions to this specification which may consist + of changes to the ZIP format itself, or that may consist of other + content changes to this document. Versions of this document and + the format in DRAFT form may be subject to modification prior to + publication STATUS of FINAL. DRAFT versions are published periodically + to provide notification to the ZIP community of pending changes and to + provide opportunity for review and comment. + + 2.1.2 Versions of this document having a STATUS of FINAL are + considered to be in the final form for that version of the document + and are not subject to further change until a new, higher version + numbered document is published. Newer versions of this format + specification are intended to remain interoperable with with all prior + versions whenever technically possible. + +2.2 Change Log +-------------- + + Version Change Description Date + ------- ------------------ ---------- + 5.2 -Single Password Symmetric Encryption 07/16/2003 + storage + + 6.1.0 -Smartcard compatibility 01/20/2004 + -Documentation on certificate storage + + 6.2.0 -Introduction of Central Directory 04/26/2004 + Encryption for encrypting metadata + -Added OS X to Version Made By values + + 6.2.1 -Added Extra Field placeholder for 04/01/2005 + POSZIP using ID 0x4690 + + -Clarified size field on + "zip64 end of central directory record" + + 6.2.2 -Documented Final Feature Specification 01/06/2006 + for Strong Encryption + + -Clarifications and typographical + corrections + + 6.3.0 -Added tape positioning storage 09/29/2006 + parameters + + -Expanded list of supported hash algorithms + + -Expanded list of supported compression + algorithms + + -Expanded list of supported encryption + algorithms + + -Added option for Unicode filename + storage + + -Clarifications for consistent use + of Data Descriptor records + + -Added additional "Extra Field" + definitions + + 6.3.1 -Corrected standard hash values for 04/11/2007 + SHA-256/384/512 + + 6.3.2 -Added compression method 97 09/28/2007 + + -Documented InfoZIP "Extra Field" + values for UTF-8 file name and + file comment storage + + 6.3.3 -Formatting changes to support 09/01/2012 + easier referencing of this APPNOTE + from other documents and standards + + 6.3.4 -Address change 10/01/2014 + + +3.0 Notations +------------- + + 3.1 Use of the term MUST or SHALL indicates a required element. + + 3.2 MAY NOT or SHALL NOT indicates an element is prohibited from use. + + 3.3 SHOULD indicates a RECOMMENDED element. + + 3.4 SHOULD NOT indicates an element NOT RECOMMENDED for use. + + 3.5 MAY indicates an OPTIONAL element. + + +4.0 ZIP Files +------------- + +4.1 What is a ZIP file +---------------------- + + 4.1.1 ZIP files MAY be identified by the standard .ZIP file extension + although use of a file extension is not required. Use of the + extension .ZIPX is also recognized and MAY be used for ZIP files. + Other common file extensions using the ZIP format include .JAR, .WAR, + .DOCX, .XLXS, .PPTX, .ODT, .ODS, .ODP and others. Programs reading or + writing ZIP files SHOULD rely on internal record signatures described + in this document to identify files in this format. + + 4.1.2 ZIP files SHOULD contain at least one file and MAY contain + multiple files. + + 4.1.3 Data compression MAY be used to reduce the size of files + placed into a ZIP file, but is not required. This format supports the + use of multiple data compression algorithms. When compression is used, + one of the documented compression algorithms MUST be used. Implementors + are advised to experiment with their data to determine which of the + available algorithms provides the best compression for their needs. + Compression method 8 (Deflate) is the method used by default by most + ZIP compatible application programs. + + + 4.1.4 Data encryption MAY be used to protect files within a ZIP file. + Keying methods supported for encryption within this format include + passwords and public/private keys. Either MAY be used individually + or in combination. Encryption MAY be applied to individual files. + Additional security MAY be used through the encryption of ZIP file + metadata stored within the Central Directory. See the section on the + Strong Encryption Specification for information. Refer to the section + in this document entitled "Incorporating PKWARE Proprietary Technology + into Your Product" for more information. + + 4.1.5 Data integrity MUST be provided for each file using CRC32. + + 4.1.6 Additional data integrity MAY be included through the use of + digital signatures. Individual files MAY be signed with one or more + digital signatures. The Central Directory, if signed, MUST use a + single signature. + + 4.1.7 Files MAY be placed within a ZIP file uncompressed or stored. + The term "stored" as used in the context of this document means the file + is copied into the ZIP file uncompressed. + + 4.1.8 Each data file placed into a ZIP file MAY be compressed, stored, + encrypted or digitally signed independent of how other data files in the + same ZIP file are archived. + + 4.1.9 ZIP files MAY be streamed, split into segments (on fixed or on + removable media) or "self-extracting". Self-extracting ZIP + files MUST include extraction code for a target platform within + the ZIP file. + + 4.1.10 Extensibility is provided for platform or application specific + needs through extra data fields that MAY be defined for custom + purposes. Extra data definitions MUST NOT conflict with existing + documented record definitions. + + 4.1.11 Common uses for ZIP MAY also include the use of manifest files. + Manifest files store application specific information within a file stored + within the ZIP file. This manifest file SHOULD be the first file in the + ZIP file. This specification does not provide any information or guidance on + the use of manifest files within ZIP files. Refer to the application developer + for information on using manifest files and for any additional profile + information on using ZIP within an application. + + 4.1.12 ZIP files MAY be placed within other ZIP files. + +4.2 ZIP Metadata +---------------- + + 4.2.1 ZIP files are identified by metadata consisting of defined record types + containing the storage information necessary for maintaining the files + placed into a ZIP file. Each record type MUST be identified using a header + signature that identifies the record type. Signature values begin with the + two byte constant marker of 0x4b50, representing the characters "PK". + + +4.3 General Format of a .ZIP file +--------------------------------- + + 4.3.1 A ZIP file MUST contain an "end of central directory record". A ZIP + file containing only an "end of central directory record" is considered an + empty ZIP file. Files may be added or replaced within a ZIP file, or deleted. + A ZIP file MUST have only one "end of central directory record". Other + records defined in this specification MAY be used as needed to support + storage requirements for individual ZIP files. + + 4.3.2 Each file placed into a ZIP file MUST be preceeded by a "local + file header" record for that file. Each "local file header" MUST be + accompanied by a corresponding "central directory header" record within + the central directory section of the ZIP file. + + 4.3.3 Files MAY be stored in arbitrary order within a ZIP file. A ZIP + file MAY span multiple volumes or it MAY be split into user-defined + segment sizes. All values MUST be stored in little-endian byte order unless + otherwise specified in this document for a specific data element. + + 4.3.4 Compression MUST NOT be applied to a "local file header", an "encryption + header", or an "end of central directory record". Individual "central + directory records" must not be compressed, but the aggregate of all central + directory records MAY be compressed. + + 4.3.5 File data MAY be followed by a "data descriptor" for the file. Data + descriptors are used to facilitate ZIP file streaming. + + + 4.3.6 Overall .ZIP file format: + + [local file header 1] + [encryption header 1] + [file data 1] + [data descriptor 1] + . + . + . + [local file header n] + [encryption header n] + [file data n] + [data descriptor n] + [archive decryption header] + [archive extra data record] + [central directory header 1] + . + . + . + [central directory header n] + [zip64 end of central directory record] + [zip64 end of central directory locator] + [end of central directory record] + + + 4.3.7 Local file header: + + local file header signature 4 bytes (0x04034b50) + version needed to extract 2 bytes + general purpose bit flag 2 bytes + compression method 2 bytes + last mod file time 2 bytes + last mod file date 2 bytes + crc-32 4 bytes + compressed size 4 bytes + uncompressed size 4 bytes + file name length 2 bytes + extra field length 2 bytes + + file name (variable size) + extra field (variable size) + + 4.3.8 File data + + Immediately following the local header for a file + SHOULD be placed the compressed or stored data for the file. + If the file is encrypted, the encryption header for the file + SHOULD be placed after the local header and before the file + data. The series of [local file header][encryption header] + [file data][data descriptor] repeats for each file in the + .ZIP archive. + + Zero-byte files, directories, and other file types that + contain no content MUST not include file data. + + 4.3.9 Data descriptor: + + crc-32 4 bytes + compressed size 4 bytes + uncompressed size 4 bytes + + 4.3.9.1 This descriptor MUST exist if bit 3 of the general + purpose bit flag is set (see below). It is byte aligned + and immediately follows the last byte of compressed data. + This descriptor SHOULD be used only when it was not possible to + seek in the output .ZIP file, e.g., when the output .ZIP file + was standard output or a non-seekable device. For ZIP64(tm) format + archives, the compressed and uncompressed sizes are 8 bytes each. + + 4.3.9.2 When compressing files, compressed and uncompressed sizes + should be stored in ZIP64 format (as 8 byte values) when a + file's size exceeds 0xFFFFFFFF. However ZIP64 format may be + used regardless of the size of a file. When extracting, if + the zip64 extended information extra field is present for + the file the compressed and uncompressed sizes will be 8 + byte values. + + 4.3.9.3 Although not originally assigned a signature, the value + 0x08074b50 has commonly been adopted as a signature value + for the data descriptor record. Implementers should be + aware that ZIP files may be encountered with or without this + signature marking data descriptors and SHOULD account for + either case when reading ZIP files to ensure compatibility. + + 4.3.9.4 When writing ZIP files, implementors SHOULD include the + signature value marking the data descriptor record. When + the signature is used, the fields currently defined for + the data descriptor record will immediately follow the + signature. + + 4.3.9.5 An extensible data descriptor will be released in a + future version of this APPNOTE. This new record is intended to + resolve conflicts with the use of this record going forward, + and to provide better support for streamed file processing. + + 4.3.9.6 When the Central Directory Encryption method is used, + the data descriptor record is not required, but MAY be used. + If present, and bit 3 of the general purpose bit field is set to + indicate its presence, the values in fields of the data descriptor + record MUST be set to binary zeros. See the section on the Strong + Encryption Specification for information. Refer to the section in + this document entitled "Incorporating PKWARE Proprietary Technology + into Your Product" for more information. + + + 4.3.10 Archive decryption header: + + 4.3.10.1 The Archive Decryption Header is introduced in version 6.2 + of the ZIP format specification. This record exists in support + of the Central Directory Encryption Feature implemented as part of + the Strong Encryption Specification as described in this document. + When the Central Directory Structure is encrypted, this decryption + header MUST precede the encrypted data segment. + + 4.3.10.2 The encrypted data segment SHALL consist of the Archive + extra data record (if present) and the encrypted Central Directory + Structure data. The format of this data record is identical to the + Decryption header record preceding compressed file data. If the + central directory structure is encrypted, the location of the start of + this data record is determined using the Start of Central Directory + field in the Zip64 End of Central Directory record. See the + section on the Strong Encryption Specification for information + on the fields used in the Archive Decryption Header record. + Refer to the section in this document entitled "Incorporating + PKWARE Proprietary Technology into Your Product" for more information. + + + 4.3.11 Archive extra data record: + + archive extra data signature 4 bytes (0x08064b50) + extra field length 4 bytes + extra field data (variable size) + + 4.3.11.1 The Archive Extra Data Record is introduced in version 6.2 + of the ZIP format specification. This record MAY be used in support + of the Central Directory Encryption Feature implemented as part of + the Strong Encryption Specification as described in this document. + When present, this record MUST immediately precede the central + directory data structure. + + 4.3.11.2 The size of this data record SHALL be included in the + Size of the Central Directory field in the End of Central + Directory record. If the central directory structure is compressed, + but not encrypted, the location of the start of this data record is + determined using the Start of Central Directory field in the Zip64 + End of Central Directory record. Refer to the section in this document + entitled "Incorporating PKWARE Proprietary Technology into Your + Product" for more information. + + 4.3.12 Central directory structure: + + [central directory header 1] + . + . + . + [central directory header n] + [digital signature] + + File header: + + central file header signature 4 bytes (0x02014b50) + version made by 2 bytes + version needed to extract 2 bytes + general purpose bit flag 2 bytes + compression method 2 bytes + last mod file time 2 bytes + last mod file date 2 bytes + crc-32 4 bytes + compressed size 4 bytes + uncompressed size 4 bytes + file name length 2 bytes + extra field length 2 bytes + file comment length 2 bytes + disk number start 2 bytes + internal file attributes 2 bytes + external file attributes 4 bytes + relative offset of local header 4 bytes + + file name (variable size) + extra field (variable size) + file comment (variable size) + + 4.3.13 Digital signature: + + header signature 4 bytes (0x05054b50) + size of data 2 bytes + signature data (variable size) + + With the introduction of the Central Directory Encryption + feature in version 6.2 of this specification, the Central + Directory Structure MAY be stored both compressed and encrypted. + Although not required, it is assumed when encrypting the + Central Directory Structure, that it will be compressed + for greater storage efficiency. Information on the + Central Directory Encryption feature can be found in the section + describing the Strong Encryption Specification. The Digital + Signature record will be neither compressed nor encrypted. + + 4.3.14 Zip64 end of central directory record + + zip64 end of central dir + signature 4 bytes (0x06064b50) + size of zip64 end of central + directory record 8 bytes + version made by 2 bytes + version needed to extract 2 bytes + number of this disk 4 bytes + number of the disk with the + start of the central directory 4 bytes + total number of entries in the + central directory on this disk 8 bytes + total number of entries in the + central directory 8 bytes + size of the central directory 8 bytes + offset of start of central + directory with respect to + the starting disk number 8 bytes + zip64 extensible data sector (variable size) + + 4.3.14.1 The value stored into the "size of zip64 end of central + directory record" should be the size of the remaining + record and should not include the leading 12 bytes. + + Size = SizeOfFixedFields + SizeOfVariableData - 12. + + 4.3.14.2 The above record structure defines Version 1 of the + zip64 end of central directory record. Version 1 was + implemented in versions of this specification preceding + 6.2 in support of the ZIP64 large file feature. The + introduction of the Central Directory Encryption feature + implemented in version 6.2 as part of the Strong Encryption + Specification defines Version 2 of this record structure. + Refer to the section describing the Strong Encryption + Specification for details on the version 2 format for + this record. Refer to the section in this document entitled + "Incorporating PKWARE Proprietary Technology into Your Product" + for more information applicable to use of Version 2 of this + record. + + 4.3.14.3 Special purpose data MAY reside in the zip64 extensible + data sector field following either a V1 or V2 version of this + record. To ensure identification of this special purpose data + it must include an identifying header block consisting of the + following: + + Header ID - 2 bytes + Data Size - 4 bytes + + The Header ID field indicates the type of data that is in the + data block that follows. + + Data Size identifies the number of bytes that follow for this + data block type. + + 4.3.14.4 Multiple special purpose data blocks MAY be present. + Each MUST be preceded by a Header ID and Data Size field. Current + mappings of Header ID values supported in this field are as + defined in APPENDIX C. + + 4.3.15 Zip64 end of central directory locator + + zip64 end of central dir locator + signature 4 bytes (0x07064b50) + number of the disk with the + start of the zip64 end of + central directory 4 bytes + relative offset of the zip64 + end of central directory record 8 bytes + total number of disks 4 bytes + + 4.3.16 End of central directory record: + + end of central dir signature 4 bytes (0x06054b50) + number of this disk 2 bytes + number of the disk with the + start of the central directory 2 bytes + total number of entries in the + central directory on this disk 2 bytes + total number of entries in + the central directory 2 bytes + size of the central directory 4 bytes + offset of start of central + directory with respect to + the starting disk number 4 bytes + .ZIP file comment length 2 bytes + .ZIP file comment (variable size) + +4.4 Explanation of fields +-------------------------- + + 4.4.1 General notes on fields + + 4.4.1.1 All fields unless otherwise noted are unsigned and stored + in Intel low-byte:high-byte, low-word:high-word order. + + 4.4.1.2 String fields are not null terminated, since the length + is given explicitly. + + 4.4.1.3 The entries in the central directory may not necessarily + be in the same order that files appear in the .ZIP file. + + 4.4.1.4 If one of the fields in the end of central directory + record is too small to hold required data, the field should be + set to -1 (0xFFFF or 0xFFFFFFFF) and the ZIP64 format record + should be created. + + 4.4.1.5 The end of central directory record and the Zip64 end + of central directory locator record MUST reside on the same + disk when splitting or spanning an archive. + + 4.4.2 version made by (2 bytes) + + 4.4.2.1 The upper byte indicates the compatibility of the file + attribute information. If the external file attributes + are compatible with MS-DOS and can be read by PKZIP for + DOS version 2.04g then this value will be zero. If these + attributes are not compatible, then this value will + identify the host system on which the attributes are + compatible. Software can use this information to determine + the line record format for text files etc. + + 4.4.2.2 The current mappings are: + + 0 - MS-DOS and OS/2 (FAT / VFAT / FAT32 file systems) + 1 - Amiga 2 - OpenVMS + 3 - UNIX 4 - VM/CMS + 5 - Atari ST 6 - OS/2 H.P.F.S. + 7 - Macintosh 8 - Z-System + 9 - CP/M 10 - Windows NTFS + 11 - MVS (OS/390 - Z/OS) 12 - VSE + 13 - Acorn Risc 14 - VFAT + 15 - alternate MVS 16 - BeOS + 17 - Tandem 18 - OS/400 + 19 - OS X (Darwin) 20 thru 255 - unused + + 4.4.2.3 The lower byte indicates the ZIP specification version + (the version of this document) supported by the software + used to encode the file. The value/10 indicates the major + version number, and the value mod 10 is the minor version + number. + + 4.4.3 version needed to extract (2 bytes) + + 4.4.3.1 The minimum supported ZIP specification version needed + to extract the file, mapped as above. This value is based on + the specific format features a ZIP program MUST support to + be able to extract the file. If multiple features are + applied to a file, the minimum version MUST be set to the + feature having the highest value. New features or feature + changes affecting the published format specification will be + implemented using higher version numbers than the last + published value to avoid conflict. + + 4.4.3.2 Current minimum feature versions are as defined below: + + 1.0 - Default value + 1.1 - File is a volume label + 2.0 - File is a folder (directory) + 2.0 - File is compressed using Deflate compression + 2.0 - File is encrypted using traditional PKWARE encryption + 2.1 - File is compressed using Deflate64(tm) + 2.5 - File is compressed using PKWARE DCL Implode + 2.7 - File is a patch data set + 4.5 - File uses ZIP64 format extensions + 4.6 - File is compressed using BZIP2 compression* + 5.0 - File is encrypted using DES + 5.0 - File is encrypted using 3DES + 5.0 - File is encrypted using original RC2 encryption + 5.0 - File is encrypted using RC4 encryption + 5.1 - File is encrypted using AES encryption + 5.1 - File is encrypted using corrected RC2 encryption** + 5.2 - File is encrypted using corrected RC2-64 encryption** + 6.1 - File is encrypted using non-OAEP key wrapping*** + 6.2 - Central directory encryption + 6.3 - File is compressed using LZMA + 6.3 - File is compressed using PPMd+ + 6.3 - File is encrypted using Blowfish + 6.3 - File is encrypted using Twofish + + 4.4.3.3 Notes on version needed to extract + + * Early 7.x (pre-7.2) versions of PKZIP incorrectly set the + version needed to extract for BZIP2 compression to be 50 + when it should have been 46. + + ** Refer to the section on Strong Encryption Specification + for additional information regarding RC2 corrections. + + *** Certificate encryption using non-OAEP key wrapping is the + intended mode of operation for all versions beginning with 6.1. + Support for OAEP key wrapping MUST only be used for + backward compatibility when sending ZIP files to be opened by + versions of PKZIP older than 6.1 (5.0 or 6.0). + + + Files compressed using PPMd MUST set the version + needed to extract field to 6.3, however, not all ZIP + programs enforce this and may be unable to decompress + data files compressed using PPMd if this value is set. + + When using ZIP64 extensions, the corresponding value in the + zip64 end of central directory record MUST also be set. + This field should be set appropriately to indicate whether + Version 1 or Version 2 format is in use. + + + 4.4.4 general purpose bit flag: (2 bytes) + + Bit 0: If set, indicates that the file is encrypted. + + (For Method 6 - Imploding) + Bit 1: If the compression method used was type 6, + Imploding, then this bit, if set, indicates + an 8K sliding dictionary was used. If clear, + then a 4K sliding dictionary was used. + + Bit 2: If the compression method used was type 6, + Imploding, then this bit, if set, indicates + 3 Shannon-Fano trees were used to encode the + sliding dictionary output. If clear, then 2 + Shannon-Fano trees were used. + + (For Methods 8 and 9 - Deflating) + Bit 2 Bit 1 + 0 0 Normal (-en) compression option was used. + 0 1 Maximum (-exx/-ex) compression option was used. + 1 0 Fast (-ef) compression option was used. + 1 1 Super Fast (-es) compression option was used. + + (For Method 14 - LZMA) + Bit 1: If the compression method used was type 14, + LZMA, then this bit, if set, indicates + an end-of-stream (EOS) marker is used to + mark the end of the compressed data stream. + If clear, then an EOS marker is not present + and the compressed data size must be known + to extract. + + Note: Bits 1 and 2 are undefined if the compression + method is any other. + + Bit 3: If this bit is set, the fields crc-32, compressed + size and uncompressed size are set to zero in the + local header. The correct values are put in the + data descriptor immediately following the compressed + data. (Note: PKZIP version 2.04g for DOS only + recognizes this bit for method 8 compression, newer + versions of PKZIP recognize this bit for any + compression method.) + + Bit 4: Reserved for use with method 8, for enhanced + deflating. + + Bit 5: If this bit is set, this indicates that the file is + compressed patched data. (Note: Requires PKZIP + version 2.70 or greater) + + Bit 6: Strong encryption. If this bit is set, you MUST + set the version needed to extract value to at least + 50 and you MUST also set bit 0. If AES encryption + is used, the version needed to extract value MUST + be at least 51. See the section describing the Strong + Encryption Specification for details. Refer to the + section in this document entitled "Incorporating PKWARE + Proprietary Technology into Your Product" for more + information. + + Bit 7: Currently unused. + + Bit 8: Currently unused. + + Bit 9: Currently unused. + + Bit 10: Currently unused. + + Bit 11: Language encoding flag (EFS). If this bit is set, + the filename and comment fields for this file + MUST be encoded using UTF-8. (see APPENDIX D) + + Bit 12: Reserved by PKWARE for enhanced compression. + + Bit 13: Set when encrypting the Central Directory to indicate + selected data values in the Local Header are masked to + hide their actual values. See the section describing + the Strong Encryption Specification for details. Refer + to the section in this document entitled "Incorporating + PKWARE Proprietary Technology into Your Product" for + more information. + + Bit 14: Reserved by PKWARE. + + Bit 15: Reserved by PKWARE. + + 4.4.5 compression method: (2 bytes) + + 0 - The file is stored (no compression) + 1 - The file is Shrunk + 2 - The file is Reduced with compression factor 1 + 3 - The file is Reduced with compression factor 2 + 4 - The file is Reduced with compression factor 3 + 5 - The file is Reduced with compression factor 4 + 6 - The file is Imploded + 7 - Reserved for Tokenizing compression algorithm + 8 - The file is Deflated + 9 - Enhanced Deflating using Deflate64(tm) + 10 - PKWARE Data Compression Library Imploding (old IBM TERSE) + 11 - Reserved by PKWARE + 12 - File is compressed using BZIP2 algorithm + 13 - Reserved by PKWARE + 14 - LZMA (EFS) + 15 - Reserved by PKWARE + 16 - Reserved by PKWARE + 17 - Reserved by PKWARE + 18 - File is compressed using IBM TERSE (new) + 19 - IBM LZ77 z Architecture (PFS) + 97 - WavPack compressed data + 98 - PPMd version I, Rev 1 + + + 4.4.6 date and time fields: (2 bytes each) + + The date and time are encoded in standard MS-DOS format. + If input came from standard input, the date and time are + those at which compression was started for this data. + If encrypting the central directory and general purpose bit + flag 13 is set indicating masking, the value stored in the + Local Header will be zero. + + 4.4.7 CRC-32: (4 bytes) + + The CRC-32 algorithm was generously contributed by + David Schwaderer and can be found in his excellent + book "C Programmers Guide to NetBIOS" published by + Howard W. Sams & Co. Inc. The 'magic number' for + the CRC is 0xdebb20e3. The proper CRC pre and post + conditioning is used, meaning that the CRC register + is pre-conditioned with all ones (a starting value + of 0xffffffff) and the value is post-conditioned by + taking the one's complement of the CRC residual. + If bit 3 of the general purpose flag is set, this + field is set to zero in the local header and the correct + value is put in the data descriptor and in the central + directory. When encrypting the central directory, if the + local header is not in ZIP64 format and general purpose + bit flag 13 is set indicating masking, the value stored + in the Local Header will be zero. + + 4.4.8 compressed size: (4 bytes) + 4.4.9 uncompressed size: (4 bytes) + + The size of the file compressed (4.4.8) and uncompressed, + (4.4.9) respectively. When a decryption header is present it + will be placed in front of the file data and the value of the + compressed file size will include the bytes of the decryption + header. If bit 3 of the general purpose bit flag is set, + these fields are set to zero in the local header and the + correct values are put in the data descriptor and + in the central directory. If an archive is in ZIP64 format + and the value in this field is 0xFFFFFFFF, the size will be + in the corresponding 8 byte ZIP64 extended information + extra field. When encrypting the central directory, if the + local header is not in ZIP64 format and general purpose bit + flag 13 is set indicating masking, the value stored for the + uncompressed size in the Local Header will be zero. + + 4.4.10 file name length: (2 bytes) + 4.4.11 extra field length: (2 bytes) + 4.4.12 file comment length: (2 bytes) + + The length of the file name, extra field, and comment + fields respectively. The combined length of any + directory record and these three fields should not + generally exceed 65,535 bytes. If input came from standard + input, the file name length is set to zero. + + + 4.4.13 disk number start: (2 bytes) + + The number of the disk on which this file begins. If an + archive is in ZIP64 format and the value in this field is + 0xFFFF, the size will be in the corresponding 4 byte zip64 + extended information extra field. + + 4.4.14 internal file attributes: (2 bytes) + + Bits 1 and 2 are reserved for use by PKWARE. + + 4.4.14.1 The lowest bit of this field indicates, if set, + that the file is apparently an ASCII or text file. If not + set, that the file apparently contains binary data. + The remaining bits are unused in version 1.0. + + 4.4.14.2 The 0x0002 bit of this field indicates, if set, that + a 4 byte variable record length control field precedes each + logical record indicating the length of the record. The + record length control field is stored in little-endian byte + order. This flag is independent of text control characters, + and if used in conjunction with text data, includes any + control characters in the total length of the record. This + value is provided for mainframe data transfer support. + + 4.4.15 external file attributes: (4 bytes) + + The mapping of the external attributes is + host-system dependent (see 'version made by'). For + MS-DOS, the low order byte is the MS-DOS directory + attribute byte. If input came from standard input, this + field is set to zero. + + 4.4.16 relative offset of local header: (4 bytes) + + This is the offset from the start of the first disk on + which this file appears, to where the local header should + be found. If an archive is in ZIP64 format and the value + in this field is 0xFFFFFFFF, the size will be in the + corresponding 8 byte zip64 extended information extra field. + + 4.4.17 file name: (Variable) + + 4.4.17.1 The name of the file, with optional relative path. + The path stored MUST not contain a drive or + device letter, or a leading slash. All slashes + MUST be forward slashes '/' as opposed to + backwards slashes '\' for compatibility with Amiga + and UNIX file systems etc. If input came from standard + input, there is no file name field. + + 4.4.17.2 If using the Central Directory Encryption Feature and + general purpose bit flag 13 is set indicating masking, the file + name stored in the Local Header will not be the actual file name. + A masking value consisting of a unique hexadecimal value will + be stored. This value will be sequentially incremented for each + file in the archive. See the section on the Strong Encryption + Specification for details on retrieving the encrypted file name. + Refer to the section in this document entitled "Incorporating PKWARE + Proprietary Technology into Your Product" for more information. + + + 4.4.18 file comment: (Variable) + + The comment for this file. + + 4.4.19 number of this disk: (2 bytes) + + The number of this disk, which contains central + directory end record. If an archive is in ZIP64 format + and the value in this field is 0xFFFF, the size will + be in the corresponding 4 byte zip64 end of central + directory field. + + + 4.4.20 number of the disk with the start of the central + directory: (2 bytes) + + The number of the disk on which the central + directory starts. If an archive is in ZIP64 format + and the value in this field is 0xFFFF, the size will + be in the corresponding 4 byte zip64 end of central + directory field. + + 4.4.21 total number of entries in the central dir on + this disk: (2 bytes) + + The number of central directory entries on this disk. + If an archive is in ZIP64 format and the value in + this field is 0xFFFF, the size will be in the + corresponding 8 byte zip64 end of central + directory field. + + 4.4.22 total number of entries in the central dir: (2 bytes) + + The total number of files in the .ZIP file. If an + archive is in ZIP64 format and the value in this field + is 0xFFFF, the size will be in the corresponding 8 byte + zip64 end of central directory field. + + 4.4.23 size of the central directory: (4 bytes) + + The size (in bytes) of the entire central directory. + If an archive is in ZIP64 format and the value in + this field is 0xFFFFFFFF, the size will be in the + corresponding 8 byte zip64 end of central + directory field. + + 4.4.24 offset of start of central directory with respect to + the starting disk number: (4 bytes) + + Offset of the start of the central directory on the + disk on which the central directory starts. If an + archive is in ZIP64 format and the value in this + field is 0xFFFFFFFF, the size will be in the + corresponding 8 byte zip64 end of central + directory field. + + 4.4.25 .ZIP file comment length: (2 bytes) + + The length of the comment for this .ZIP file. + + 4.4.26 .ZIP file comment: (Variable) + + The comment for this .ZIP file. ZIP file comment data + is stored unsecured. No encryption or data authentication + is applied to this area at this time. Confidential information + should not be stored in this section. + + 4.4.27 zip64 extensible data sector (variable size) + + (currently reserved for use by PKWARE) + + + 4.4.28 extra field: (Variable) + + This SHOULD be used for storage expansion. If additional + information needs to be stored within a ZIP file for special + application or platform needs, it SHOULD be stored here. + Programs supporting earlier versions of this specification can + then safely skip the file, and find the next file or header. + This field will be 0 length in version 1.0. + + Existing extra fields are defined in the section + Extensible data fields that follows. + +4.5 Extensible data fields +-------------------------- + + 4.5.1 In order to allow different programs and different types + of information to be stored in the 'extra' field in .ZIP + files, the following structure MUST be used for all + programs storing data in this field: + + header1+data1 + header2+data2 . . . + + Each header should consist of: + + Header ID - 2 bytes + Data Size - 2 bytes + + Note: all fields stored in Intel low-byte/high-byte order. + + The Header ID field indicates the type of data that is in + the following data block. + + Header IDs of 0 thru 31 are reserved for use by PKWARE. + The remaining IDs can be used by third party vendors for + proprietary usage. + + 4.5.2 The current Header ID mappings defined by PKWARE are: + + 0x0001 Zip64 extended information extra field + 0x0007 AV Info + 0x0008 Reserved for extended language encoding data (PFS) + (see APPENDIX D) + 0x0009 OS/2 + 0x000a NTFS + 0x000c OpenVMS + 0x000d UNIX + 0x000e Reserved for file stream and fork descriptors + 0x000f Patch Descriptor + 0x0014 PKCS#7 Store for X.509 Certificates + 0x0015 X.509 Certificate ID and Signature for + individual file + 0x0016 X.509 Certificate ID for Central Directory + 0x0017 Strong Encryption Header + 0x0018 Record Management Controls + 0x0019 PKCS#7 Encryption Recipient Certificate List + 0x0065 IBM S/390 (Z390), AS/400 (I400) attributes + - uncompressed + 0x0066 Reserved for IBM S/390 (Z390), AS/400 (I400) + attributes - compressed + 0x4690 POSZIP 4690 (reserved) + + + 4.5.3 -Zip64 Extended Information Extra Field (0x0001): + + The following is the layout of the zip64 extended + information "extra" block. If one of the size or + offset fields in the Local or Central directory + record is too small to hold the required data, + a Zip64 extended information record is created. + The order of the fields in the zip64 extended + information record is fixed, but the fields MUST + only appear if the corresponding Local or Central + directory record field is set to 0xFFFF or 0xFFFFFFFF. + + Note: all fields stored in Intel low-byte/high-byte order. + + Value Size Description + ----- ---- ----------- +(ZIP64) 0x0001 2 bytes Tag for this "extra" block type + Size 2 bytes Size of this "extra" block + Original + Size 8 bytes Original uncompressed file size + Compressed + Size 8 bytes Size of compressed data + Relative Header + Offset 8 bytes Offset of local header record + Disk Start + Number 4 bytes Number of the disk on which + this file starts + + This entry in the Local header MUST include BOTH original + and compressed file size fields. If encrypting the + central directory and bit 13 of the general purpose bit + flag is set indicating masking, the value stored in the + Local Header for the original file size will be zero. + + + 4.5.4 -OS/2 Extra Field (0x0009): + + The following is the layout of the OS/2 attributes "extra" + block. (Last Revision 09/05/95) + + Note: all fields stored in Intel low-byte/high-byte order. + + Value Size Description + ----- ---- ----------- +(OS/2) 0x0009 2 bytes Tag for this "extra" block type + TSize 2 bytes Size for the following data block + BSize 4 bytes Uncompressed Block Size + CType 2 bytes Compression type + EACRC 4 bytes CRC value for uncompress block + (var) variable Compressed block + + The OS/2 extended attribute structure (FEA2LIST) is + compressed and then stored in its entirety within this + structure. There will only ever be one "block" of data in + VarFields[]. + + 4.5.5 -NTFS Extra Field (0x000a): + + The following is the layout of the NTFS attributes + "extra" block. (Note: At this time the Mtime, Atime + and Ctime values MAY be used on any WIN32 system.) + + Note: all fields stored in Intel low-byte/high-byte order. + + Value Size Description + ----- ---- ----------- +(NTFS) 0x000a 2 bytes Tag for this "extra" block type + TSize 2 bytes Size of the total "extra" block + Reserved 4 bytes Reserved for future use + Tag1 2 bytes NTFS attribute tag value #1 + Size1 2 bytes Size of attribute #1, in bytes + (var) Size1 Attribute #1 data + . + . + . + TagN 2 bytes NTFS attribute tag value #N + SizeN 2 bytes Size of attribute #N, in bytes + (var) SizeN Attribute #N data + + For NTFS, values for Tag1 through TagN are as follows: + (currently only one set of attributes is defined for NTFS) + + Tag Size Description + ----- ---- ----------- + 0x0001 2 bytes Tag for attribute #1 + Size1 2 bytes Size of attribute #1, in bytes + Mtime 8 bytes File last modification time + Atime 8 bytes File last access time + Ctime 8 bytes File creation time + + 4.5.6 -OpenVMS Extra Field (0x000c): + + The following is the layout of the OpenVMS attributes + "extra" block. + + Note: all fields stored in Intel low-byte/high-byte order. + + Value Size Description + ----- ---- ----------- + (VMS) 0x000c 2 bytes Tag for this "extra" block type + TSize 2 bytes Size of the total "extra" block + CRC 4 bytes 32-bit CRC for remainder of the block + Tag1 2 bytes OpenVMS attribute tag value #1 + Size1 2 bytes Size of attribute #1, in bytes + (var) Size1 Attribute #1 data + . + . + . + TagN 2 bytes OpenVMS attribute tag value #N + SizeN 2 bytes Size of attribute #N, in bytes + (var) SizeN Attribute #N data + + OpenVMS Extra Field Rules: + + 4.5.6.1. There will be one or more attributes present, which + will each be preceded by the above TagX & SizeX values. + These values are identical to the ATR$C_XXXX and ATR$S_XXXX + constants which are defined in ATR.H under OpenVMS C. Neither + of these values will ever be zero. + + 4.5.6.2. No word alignment or padding is performed. + + 4.5.6.3. A well-behaved PKZIP/OpenVMS program should never produce + more than one sub-block with the same TagX value. Also, there will + never be more than one "extra" block of type 0x000c in a particular + directory record. + + 4.5.7 -UNIX Extra Field (0x000d): + + The following is the layout of the UNIX "extra" block. + Note: all fields are stored in Intel low-byte/high-byte + order. + + Value Size Description + ----- ---- ----------- +(UNIX) 0x000d 2 bytes Tag for this "extra" block type + TSize 2 bytes Size for the following data block + Atime 4 bytes File last access time + Mtime 4 bytes File last modification time + Uid 2 bytes File user ID + Gid 2 bytes File group ID + (var) variable Variable length data field + + The variable length data field will contain file type + specific data. Currently the only values allowed are + the original "linked to" file names for hard or symbolic + links, and the major and minor device node numbers for + character and block device nodes. Since device nodes + cannot be either symbolic or hard links, only one set of + variable length data is stored. Link files will have the + name of the original file stored. This name is NOT NULL + terminated. Its size can be determined by checking TSize - + 12. Device entries will have eight bytes stored as two 4 + byte entries (in little endian format). The first entry + will be the major device number, and the second the minor + device number. + + 4.5.8 -PATCH Descriptor Extra Field (0x000f): + + 4.5.8.1 The following is the layout of the Patch Descriptor + "extra" block. + + Note: all fields stored in Intel low-byte/high-byte order. + + Value Size Description + ----- ---- ----------- +(Patch) 0x000f 2 bytes Tag for this "extra" block type + TSize 2 bytes Size of the total "extra" block + Version 2 bytes Version of the descriptor + Flags 4 bytes Actions and reactions (see below) + OldSize 4 bytes Size of the file about to be patched + OldCRC 4 bytes 32-bit CRC of the file to be patched + NewSize 4 bytes Size of the resulting file + NewCRC 4 bytes 32-bit CRC of the resulting file + + 4.5.8.2 Actions and reactions + + Bits Description + ---- ---------------- + 0 Use for auto detection + 1 Treat as a self-patch + 2-3 RESERVED + 4-5 Action (see below) + 6-7 RESERVED + 8-9 Reaction (see below) to absent file + 10-11 Reaction (see below) to newer file + 12-13 Reaction (see below) to unknown file + 14-15 RESERVED + 16-31 RESERVED + + 4.5.8.2.1 Actions + + Action Value + ------ ----- + none 0 + add 1 + delete 2 + patch 3 + + 4.5.8.2.2 Reactions + + Reaction Value + -------- ----- + ask 0 + skip 1 + ignore 2 + fail 3 + + 4.5.8.3 Patch support is provided by PKPatchMaker(tm) technology + and is covered under U.S. Patents and Patents Pending. The use or + implementation in a product of certain technological aspects set + forth in the current APPNOTE, including those with regard to + strong encryption or patching requires a license from PKWARE. + Refer to the section in this document entitled "Incorporating + PKWARE Proprietary Technology into Your Product" for more + information. + + 4.5.9 -PKCS#7 Store for X.509 Certificates (0x0014): + + This field MUST contain information about each of the certificates + files may be signed with. When the Central Directory Encryption + feature is enabled for a ZIP file, this record will appear in + the Archive Extra Data Record, otherwise it will appear in the + first central directory record and will be ignored in any + other record. + + + Note: all fields stored in Intel low-byte/high-byte order. + + Value Size Description + ----- ---- ----------- +(Store) 0x0014 2 bytes Tag for this "extra" block type + TSize 2 bytes Size of the store data + TData TSize Data about the store + + + 4.5.10 -X.509 Certificate ID and Signature for individual file (0x0015): + + This field contains the information about which certificate in + the PKCS#7 store was used to sign a particular file. It also + contains the signature data. This field can appear multiple + times, but can only appear once per certificate. + + Note: all fields stored in Intel low-byte/high-byte order. + + Value Size Description + ----- ---- ----------- +(CID) 0x0015 2 bytes Tag for this "extra" block type + TSize 2 bytes Size of data that follows + TData TSize Signature Data + + 4.5.11 -X.509 Certificate ID and Signature for central directory (0x0016): + + This field contains the information about which certificate in + the PKCS#7 store was used to sign the central directory structure. + When the Central Directory Encryption feature is enabled for a + ZIP file, this record will appear in the Archive Extra Data Record, + otherwise it will appear in the first central directory record. + + Note: all fields stored in Intel low-byte/high-byte order. + + Value Size Description + ----- ---- ----------- +(CDID) 0x0016 2 bytes Tag for this "extra" block type + TSize 2 bytes Size of data that follows + TData TSize Data + + 4.5.12 -Strong Encryption Header (0x0017): + + Value Size Description + ----- ---- ----------- + 0x0017 2 bytes Tag for this "extra" block type + TSize 2 bytes Size of data that follows + Format 2 bytes Format definition for this record + AlgID 2 bytes Encryption algorithm identifier + Bitlen 2 bytes Bit length of encryption key + Flags 2 bytes Processing flags + CertData TSize-8 Certificate decryption extra field data + (refer to the explanation for CertData + in the section describing the + Certificate Processing Method under + the Strong Encryption Specification) + + See the section describing the Strong Encryption Specification + for details. Refer to the section in this document entitled + "Incorporating PKWARE Proprietary Technology into Your Product" + for more information. + + 4.5.13 -Record Management Controls (0x0018): + + Value Size Description + ----- ---- ----------- +(Rec-CTL) 0x0018 2 bytes Tag for this "extra" block type + CSize 2 bytes Size of total extra block data + Tag1 2 bytes Record control attribute 1 + Size1 2 bytes Size of attribute 1, in bytes + Data1 Size1 Attribute 1 data + . + . + . + TagN 2 bytes Record control attribute N + SizeN 2 bytes Size of attribute N, in bytes + DataN SizeN Attribute N data + + + 4.5.14 -PKCS#7 Encryption Recipient Certificate List (0x0019): + + This field MAY contain information about each of the certificates + used in encryption processing and it can be used to identify who is + allowed to decrypt encrypted files. This field should only appear + in the archive extra data record. This field is not required and + serves only to aid archive modifications by preserving public + encryption key data. Individual security requirements may dictate + that this data be omitted to deter information exposure. + + Note: all fields stored in Intel low-byte/high-byte order. + + Value Size Description + ----- ---- ----------- +(CStore) 0x0019 2 bytes Tag for this "extra" block type + TSize 2 bytes Size of the store data + TData TSize Data about the store + + TData: + + Value Size Description + ----- ---- ----------- + Version 2 bytes Format version number - must 0x0001 at this time + CStore (var) PKCS#7 data blob + + See the section describing the Strong Encryption Specification + for details. Refer to the section in this document entitled + "Incorporating PKWARE Proprietary Technology into Your Product" + for more information. + + 4.5.15 -MVS Extra Field (0x0065): + + The following is the layout of the MVS "extra" block. + Note: Some fields are stored in Big Endian format. + All text is in EBCDIC format unless otherwise specified. + + Value Size Description + ----- ---- ----------- +(MVS) 0x0065 2 bytes Tag for this "extra" block type + TSize 2 bytes Size for the following data block + ID 4 bytes EBCDIC "Z390" 0xE9F3F9F0 or + "T4MV" for TargetFour + (var) TSize-4 Attribute data (see APPENDIX B) + + + 4.5.16 -OS/400 Extra Field (0x0065): + + The following is the layout of the OS/400 "extra" block. + Note: Some fields are stored in Big Endian format. + All text is in EBCDIC format unless otherwise specified. + + Value Size Description + ----- ---- ----------- +(OS400) 0x0065 2 bytes Tag for this "extra" block type + TSize 2 bytes Size for the following data block + ID 4 bytes EBCDIC "I400" 0xC9F4F0F0 or + "T4MV" for TargetFour + (var) TSize-4 Attribute data (see APPENDIX A) + +4.6 Third Party Mappings +------------------------ + + 4.6.1 Third party mappings commonly used are: + + 0x07c8 Macintosh + 0x2605 ZipIt Macintosh + 0x2705 ZipIt Macintosh 1.3.5+ + 0x2805 ZipIt Macintosh 1.3.5+ + 0x334d Info-ZIP Macintosh + 0x4341 Acorn/SparkFS + 0x4453 Windows NT security descriptor (binary ACL) + 0x4704 VM/CMS + 0x470f MVS + 0x4b46 FWKCS MD5 (see below) + 0x4c41 OS/2 access control list (text ACL) + 0x4d49 Info-ZIP OpenVMS + 0x4f4c Xceed original location extra field + 0x5356 AOS/VS (ACL) + 0x5455 extended timestamp + 0x554e Xceed unicode extra field + 0x5855 Info-ZIP UNIX (original, also OS/2, NT, etc) + 0x6375 Info-ZIP Unicode Comment Extra Field + 0x6542 BeOS/BeBox + 0x7075 Info-ZIP Unicode Path Extra Field + 0x756e ASi UNIX + 0x7855 Info-ZIP UNIX (new) + 0xa220 Microsoft Open Packaging Growth Hint + 0xfd4a SMS/QDOS + + Detailed descriptions of Extra Fields defined by third + party mappings will be documented as information on + these data structures is made available to PKWARE. + PKWARE does not guarantee the accuracy of any published + third party data. + + 4.6.2 Third-party Extra Fields must include a Header ID using + the format defined in the section of this document + titled Extensible Data Fields (section 4.5). + + The Data Size field indicates the size of the following + data block. Programs can use this value to skip to the + next header block, passing over any data blocks that are + not of interest. + + Note: As stated above, the size of the entire .ZIP file + header, including the file name, comment, and extra + field should not exceed 64K in size. + + 4.6.3 In case two different programs should appropriate the same + Header ID value, it is strongly recommended that each + program SHOULD place a unique signature of at least two bytes in + size (and preferably 4 bytes or bigger) at the start of + each data area. Every program SHOULD verify that its + unique signature is present, in addition to the Header ID + value being correct, before assuming that it is a block of + known type. + + Third-party Mappings: + + 4.6.4 -ZipIt Macintosh Extra Field (long) (0x2605): + + The following is the layout of the ZipIt extra block + for Macintosh. The local-header and central-header versions + are identical. This block must be present if the file is + stored MacBinary-encoded and it should not be used if the file + is not stored MacBinary-encoded. + + Value Size Description + ----- ---- ----------- + (Mac2) 0x2605 Short tag for this extra block type + TSize Short total data size for this block + "ZPIT" beLong extra-field signature + FnLen Byte length of FileName + FileName variable full Macintosh filename + FileType Byte[4] four-byte Mac file type string + Creator Byte[4] four-byte Mac creator string + + + 4.6.5 -ZipIt Macintosh Extra Field (short, for files) (0x2705): + + The following is the layout of a shortened variant of the + ZipIt extra block for Macintosh (without "full name" entry). + This variant is used by ZipIt 1.3.5 and newer for entries of + files (not directories) that do not have a MacBinary encoded + file. The local-header and central-header versions are identical. + + Value Size Description + ----- ---- ----------- + (Mac2b) 0x2705 Short tag for this extra block type + TSize Short total data size for this block (12) + "ZPIT" beLong extra-field signature + FileType Byte[4] four-byte Mac file type string + Creator Byte[4] four-byte Mac creator string + fdFlags beShort attributes from FInfo.frFlags, + may be omitted + 0x0000 beShort reserved, may be omitted + + + 4.6.6 -ZipIt Macintosh Extra Field (short, for directories) (0x2805): + + The following is the layout of a shortened variant of the + ZipIt extra block for Macintosh used only for directory + entries. This variant is used by ZipIt 1.3.5 and newer to + save some optional Mac-specific information about directories. + The local-header and central-header versions are identical. + + Value Size Description + ----- ---- ----------- + (Mac2c) 0x2805 Short tag for this extra block type + TSize Short total data size for this block (12) + "ZPIT" beLong extra-field signature + frFlags beShort attributes from DInfo.frFlags, may + be omitted + View beShort ZipIt view flag, may be omitted + + + The View field specifies ZipIt-internal settings as follows: + + Bits of the Flags: + bit 0 if set, the folder is shown expanded (open) + when the archive contents are viewed in ZipIt. + bits 1-15 reserved, zero; + + + 4.6.7 -FWKCS MD5 Extra Field (0x4b46): + + The FWKCS Contents_Signature System, used in + automatically identifying files independent of file name, + optionally adds and uses an extra field to support the + rapid creation of an enhanced contents_signature: + + Header ID = 0x4b46 + Data Size = 0x0013 + Preface = 'M','D','5' + followed by 16 bytes containing the uncompressed file's + 128_bit MD5 hash(1), low byte first. + + When FWKCS revises a .ZIP file central directory to add + this extra field for a file, it also replaces the + central directory entry for that file's uncompressed + file length with a measured value. + + FWKCS provides an option to strip this extra field, if + present, from a .ZIP file central directory. In adding + this extra field, FWKCS preserves .ZIP file Authenticity + Verification; if stripping this extra field, FWKCS + preserves all versions of AV through PKZIP version 2.04g. + + FWKCS, and FWKCS Contents_Signature System, are + trademarks of Frederick W. Kantor. + + (1) R. Rivest, RFC1321.TXT, MIT Laboratory for Computer + Science and RSA Data Security, Inc., April 1992. + ll.76-77: "The MD5 algorithm is being placed in the + public domain for review and possible adoption as a + standard." + + + 4.6.8 -Info-ZIP Unicode Comment Extra Field (0x6375): + + Stores the UTF-8 version of the file comment as stored in the + central directory header. (Last Revision 20070912) + + Value Size Description + ----- ---- ----------- + (UCom) 0x6375 Short tag for this extra block type ("uc") + TSize Short total data size for this block + Version 1 byte version of this extra field, currently 1 + ComCRC32 4 bytes Comment Field CRC32 Checksum + UnicodeCom Variable UTF-8 version of the entry comment + + Currently Version is set to the number 1. If there is a need + to change this field, the version will be incremented. Changes + may not be backward compatible so this extra field should not be + used if the version is not recognized. + + The ComCRC32 is the standard zip CRC32 checksum of the File Comment + field in the central directory header. This is used to verify that + the comment field has not changed since the Unicode Comment extra field + was created. This can happen if a utility changes the File Comment + field but does not update the UTF-8 Comment extra field. If the CRC + check fails, this Unicode Comment extra field should be ignored and + the File Comment field in the header should be used instead. + + The UnicodeCom field is the UTF-8 version of the File Comment field + in the header. As UnicodeCom is defined to be UTF-8, no UTF-8 byte + order mark (BOM) is used. The length of this field is determined by + subtracting the size of the previous fields from TSize. If both the + File Name and Comment fields are UTF-8, the new General Purpose Bit + Flag, bit 11 (Language encoding flag (EFS)), can be used to indicate + both the header File Name and Comment fields are UTF-8 and, in this + case, the Unicode Path and Unicode Comment extra fields are not + needed and should not be created. Note that, for backward + compatibility, bit 11 should only be used if the native character set + of the paths and comments being zipped up are already in UTF-8. It is + expected that the same file comment storage method, either general + purpose bit 11 or extra fields, be used in both the Local and Central + Directory Header for a file. + + + 4.6.9 -Info-ZIP Unicode Path Extra Field (0x7075): + + Stores the UTF-8 version of the file name field as stored in the + local header and central directory header. (Last Revision 20070912) + + Value Size Description + ----- ---- ----------- + (UPath) 0x7075 Short tag for this extra block type ("up") + TSize Short total data size for this block + Version 1 byte version of this extra field, currently 1 + NameCRC32 4 bytes File Name Field CRC32 Checksum + UnicodeName Variable UTF-8 version of the entry File Name + + Currently Version is set to the number 1. If there is a need + to change this field, the version will be incremented. Changes + may not be backward compatible so this extra field should not be + used if the version is not recognized. + + The NameCRC32 is the standard zip CRC32 checksum of the File Name + field in the header. This is used to verify that the header + File Name field has not changed since the Unicode Path extra field + was created. This can happen if a utility renames the File Name but + does not update the UTF-8 path extra field. If the CRC check fails, + this UTF-8 Path Extra Field should be ignored and the File Name field + in the header should be used instead. + + The UnicodeName is the UTF-8 version of the contents of the File Name + field in the header. As UnicodeName is defined to be UTF-8, no UTF-8 + byte order mark (BOM) is used. The length of this field is determined + by subtracting the size of the previous fields from TSize. If both + the File Name and Comment fields are UTF-8, the new General Purpose + Bit Flag, bit 11 (Language encoding flag (EFS)), can be used to + indicate that both the header File Name and Comment fields are UTF-8 + and, in this case, the Unicode Path and Unicode Comment extra fields + are not needed and should not be created. Note that, for backward + compatibility, bit 11 should only be used if the native character set + of the paths and comments being zipped up are already in UTF-8. It is + expected that the same file name storage method, either general + purpose bit 11 or extra fields, be used in both the Local and Central + Directory Header for a file. + + + 4.6.10 -Microsoft Open Packaging Growth Hint (0xa220): + + Value Size Description + ----- ---- ----------- + 0xa220 Short tag for this extra block type + TSize Short size of Sig + PadVal + Padding + Sig Short verification signature (A028) + PadVal Short Initial padding value + Padding variable filled with NULL characters + +4.7 Manifest Files +------------------ + + 4.7.1 Applications using ZIP files may have a need for additional + information that must be included with the files placed into + a ZIP file. Application specific information that cannot be + stored using the defined ZIP storage records SHOULD be stored + using the extensible Extra Field convention defined in this + document. However, some applications may use a manifest + file as a means for storing additional information. One + example is the META-INF/MANIFEST.MF file used in ZIP formatted + files having the .JAR extension (JAR files). + + 4.7.2 A manifest file is a file created for the application process + that requires this information. A manifest file MAY be of any + file type required by the defining application process. It is + placed within the same ZIP file as files to which this information + applies. By convention, this file is typically the first file placed + into the ZIP file and it may include a defined directory path. + + 4.7.3 Manifest files may be compressed or encrypted as needed for + application processing of the files inside the ZIP files. + + Manifest files are outside of the scope of this specification. + + +5.0 Explanation of compression methods +-------------------------------------- + + +5.1 UnShrinking - Method 1 +-------------------------- + + 5.1.1 Shrinking is a Dynamic Ziv-Lempel-Welch compression algorithm + with partial clearing. The initial code size is 9 bits, and the + maximum code size is 13 bits. Shrinking differs from conventional + Dynamic Ziv-Lempel-Welch implementations in several respects: + + 5.1.2 The code size is controlled by the compressor, and is + not automatically increased when codes larger than the current + code size are created (but not necessarily used). When + the decompressor encounters the code sequence 256 + (decimal) followed by 1, it should increase the code size + read from the input stream to the next bit size. No + blocking of the codes is performed, so the next code at + the increased size should be read from the input stream + immediately after where the previous code at the smaller + bit size was read. Again, the decompressor should not + increase the code size used until the sequence 256,1 is + encountered. + + 5.1.3 When the table becomes full, total clearing is not + performed. Rather, when the compressor emits the code + sequence 256,2 (decimal), the decompressor should clear + all leaf nodes from the Ziv-Lempel tree, and continue to + use the current code size. The nodes that are cleared + from the Ziv-Lempel tree are then re-used, with the lowest + code value re-used first, and the highest code value + re-used last. The compressor can emit the sequence 256,2 + at any time. + +5.2 Expanding - Methods 2-5 +--------------------------- + + 5.2.1 The Reducing algorithm is actually a combination of two + distinct algorithms. The first algorithm compresses repeated + byte sequences, and the second algorithm takes the compressed + stream from the first algorithm and applies a probabilistic + compression method. + + 5.2.2 The probabilistic compression stores an array of 'follower + sets' S(j), for j=0 to 255, corresponding to each possible + ASCII character. Each set contains between 0 and 32 + characters, to be denoted as S(j)[0],...,S(j)[m], where m<32. + The sets are stored at the beginning of the data area for a + Reduced file, in reverse order, with S(255) first, and S(0) + last. + + 5.2.3 The sets are encoded as { N(j), S(j)[0],...,S(j)[N(j)-1] }, + where N(j) is the size of set S(j). N(j) can be 0, in which + case the follower set for S(j) is empty. Each N(j) value is + encoded in 6 bits, followed by N(j) eight bit character values + corresponding to S(j)[0] to S(j)[N(j)-1] respectively. If + N(j) is 0, then no values for S(j) are stored, and the value + for N(j-1) immediately follows. + + 5.2.4 Immediately after the follower sets, is the compressed data + stream. The compressed data stream can be interpreted for the + probabilistic decompression as follows: + + let Last-Character <- 0. + loop until done + if the follower set S(Last-Character) is empty then + read 8 bits from the input stream, and copy this + value to the output stream. + otherwise if the follower set S(Last-Character) is non-empty then + read 1 bit from the input stream. + if this bit is not zero then + read 8 bits from the input stream, and copy this + value to the output stream. + otherwise if this bit is zero then + read B(N(Last-Character)) bits from the input + stream, and assign this value to I. + Copy the value of S(Last-Character)[I] to the + output stream. + + assign the last value placed on the output stream to + Last-Character. + end loop + + B(N(j)) is defined as the minimal number of bits required to + encode the value N(j)-1. + + 5.2.5 The decompressed stream from above can then be expanded to + re-create the original file as follows: + + let State <- 0. + + loop until done + read 8 bits from the input stream into C. + case State of + 0: if C is not equal to DLE (144 decimal) then + copy C to the output stream. + otherwise if C is equal to DLE then + let State <- 1. + + 1: if C is non-zero then + let V <- C. + let Len <- L(V) + let State <- F(Len). + otherwise if C is zero then + copy the value 144 (decimal) to the output stream. + let State <- 0 + + 2: let Len <- Len + C + let State <- 3. + + 3: move backwards D(V,C) bytes in the output stream + (if this position is before the start of the output + stream, then assume that all the data before the + start of the output stream is filled with zeros). + copy Len+3 bytes from this position to the output stream. + let State <- 0. + end case + end loop + + The functions F,L, and D are dependent on the 'compression + factor', 1 through 4, and are defined as follows: + + For compression factor 1: + L(X) equals the lower 7 bits of X. + F(X) equals 2 if X equals 127 otherwise F(X) equals 3. + D(X,Y) equals the (upper 1 bit of X) * 256 + Y + 1. + For compression factor 2: + L(X) equals the lower 6 bits of X. + F(X) equals 2 if X equals 63 otherwise F(X) equals 3. + D(X,Y) equals the (upper 2 bits of X) * 256 + Y + 1. + For compression factor 3: + L(X) equals the lower 5 bits of X. + F(X) equals 2 if X equals 31 otherwise F(X) equals 3. + D(X,Y) equals the (upper 3 bits of X) * 256 + Y + 1. + For compression factor 4: + L(X) equals the lower 4 bits of X. + F(X) equals 2 if X equals 15 otherwise F(X) equals 3. + D(X,Y) equals the (upper 4 bits of X) * 256 + Y + 1. + +5.3 Imploding - Method 6 +------------------------ + + 5.3.1 The Imploding algorithm is actually a combination of two + distinct algorithms. The first algorithm compresses repeated byte + sequences using a sliding dictionary. The second algorithm is + used to compress the encoding of the sliding dictionary output, + using multiple Shannon-Fano trees. + + 5.3.2 The Imploding algorithm can use a 4K or 8K sliding dictionary + size. The dictionary size used can be determined by bit 1 in the + general purpose flag word; a 0 bit indicates a 4K dictionary + while a 1 bit indicates an 8K dictionary. + + 5.3.3 The Shannon-Fano trees are stored at the start of the + compressed file. The number of trees stored is defined by bit 2 in + the general purpose flag word; a 0 bit indicates two trees stored, + a 1 bit indicates three trees are stored. If 3 trees are stored, + the first Shannon-Fano tree represents the encoding of the + Literal characters, the second tree represents the encoding of + the Length information, the third represents the encoding of the + Distance information. When 2 Shannon-Fano trees are stored, the + Length tree is stored first, followed by the Distance tree. + + 5.3.4 The Literal Shannon-Fano tree, if present is used to represent + the entire ASCII character set, and contains 256 values. This + tree is used to compress any data not compressed by the sliding + dictionary algorithm. When this tree is present, the Minimum + Match Length for the sliding dictionary is 3. If this tree is + not present, the Minimum Match Length is 2. + + 5.3.5 The Length Shannon-Fano tree is used to compress the Length + part of the (length,distance) pairs from the sliding dictionary + output. The Length tree contains 64 values, ranging from the + Minimum Match Length, to 63 plus the Minimum Match Length. + + 5.3.6 The Distance Shannon-Fano tree is used to compress the Distance + part of the (length,distance) pairs from the sliding dictionary + output. The Distance tree contains 64 values, ranging from 0 to + 63, representing the upper 6 bits of the distance value. The + distance values themselves will be between 0 and the sliding + dictionary size, either 4K or 8K. + + 5.3.7 The Shannon-Fano trees themselves are stored in a compressed + format. The first byte of the tree data represents the number of + bytes of data representing the (compressed) Shannon-Fano tree + minus 1. The remaining bytes represent the Shannon-Fano tree + data encoded as: + + High 4 bits: Number of values at this bit length + 1. (1 - 16) + Low 4 bits: Bit Length needed to represent value + 1. (1 - 16) + + 5.3.8 The Shannon-Fano codes can be constructed from the bit lengths + using the following algorithm: + + 1) Sort the Bit Lengths in ascending order, while retaining the + order of the original lengths stored in the file. + + 2) Generate the Shannon-Fano trees: + + Code <- 0 + CodeIncrement <- 0 + LastBitLength <- 0 + i <- number of Shannon-Fano codes - 1 (either 255 or 63) + + loop while i >= 0 + Code = Code + CodeIncrement + if BitLength(i) <> LastBitLength then + LastBitLength=BitLength(i) + CodeIncrement = 1 shifted left (16 - LastBitLength) + ShannonCode(i) = Code + i <- i - 1 + end loop + + 3) Reverse the order of all the bits in the above ShannonCode() + vector, so that the most significant bit becomes the least + significant bit. For example, the value 0x1234 (hex) would + become 0x2C48 (hex). + + 4) Restore the order of Shannon-Fano codes as originally stored + within the file. + + Example: + + This example will show the encoding of a Shannon-Fano tree + of size 8. Notice that the actual Shannon-Fano trees used + for Imploding are either 64 or 256 entries in size. + + Example: 0x02, 0x42, 0x01, 0x13 + + The first byte indicates 3 values in this table. Decoding the + bytes: + 0x42 = 5 codes of 3 bits long + 0x01 = 1 code of 2 bits long + 0x13 = 2 codes of 4 bits long + + This would generate the original bit length array of: + (3, 3, 3, 3, 3, 2, 4, 4) + + There are 8 codes in this table for the values 0 thru 7. Using + the algorithm to obtain the Shannon-Fano codes produces: + + Reversed Order Original + Val Sorted Constructed Code Value Restored Length + --- ------ ----------------- -------- -------- ------ + 0: 2 1100000000000000 11 101 3 + 1: 3 1010000000000000 101 001 3 + 2: 3 1000000000000000 001 110 3 + 3: 3 0110000000000000 110 010 3 + 4: 3 0100000000000000 010 100 3 + 5: 3 0010000000000000 100 11 2 + 6: 4 0001000000000000 1000 1000 4 + 7: 4 0000000000000000 0000 0000 4 + + The values in the Val, Order Restored and Original Length columns + now represent the Shannon-Fano encoding tree that can be used for + decoding the Shannon-Fano encoded data. How to parse the + variable length Shannon-Fano values from the data stream is beyond + the scope of this document. (See the references listed at the end of + this document for more information.) However, traditional decoding + schemes used for Huffman variable length decoding, such as the + Greenlaw algorithm, can be successfully applied. + + 5.3.9 The compressed data stream begins immediately after the + compressed Shannon-Fano data. The compressed data stream can be + interpreted as follows: + + loop until done + read 1 bit from input stream. + + if this bit is non-zero then (encoded data is literal data) + if Liter
<TRUNCATED>
