Author: bodewig Date: Sat Mar 28 14:46:32 2009 New Revision: 759472 URL: http://svn.apache.org/viewvc?rev=759472&view=rev Log: some more in depth documentation
Added: commons/proper/compress/trunk/src/site/xdoc/examples.xml (with props) commons/proper/compress/trunk/src/site/xdoc/zip.xml (with props) Modified: commons/proper/compress/trunk/src/site/site.xml commons/proper/compress/trunk/src/site/xdoc/index.xml Modified: commons/proper/compress/trunk/src/site/site.xml URL: http://svn.apache.org/viewvc/commons/proper/compress/trunk/src/site/site.xml?rev=759472&r1=759471&r2=759472&view=diff ============================================================================== --- commons/proper/compress/trunk/src/site/site.xml (original) +++ commons/proper/compress/trunk/src/site/site.xml Sat Mar 28 14:46:32 2009 @@ -28,6 +28,7 @@ <body> <menu name="Compress"> <item name="Overview" href="/index.html"/> + <item name="Examples" href="/examples.html"/> <item name="Issue Tracking" href="/issue-tracking.html"/> <item name="Download" href="/downloads.html"/> <item name="Wiki" href="http://wiki.apache.org/commons/Compress"/> Added: commons/proper/compress/trunk/src/site/xdoc/examples.xml URL: http://svn.apache.org/viewvc/commons/proper/compress/trunk/src/site/xdoc/examples.xml?rev=759472&view=auto ============================================================================== --- commons/proper/compress/trunk/src/site/xdoc/examples.xml (added) +++ commons/proper/compress/trunk/src/site/xdoc/examples.xml Sat Mar 28 14:46:32 2009 @@ -0,0 +1,279 @@ +<?xml version="1.0"?> +<!-- + + Licensed to the Apache Software Foundation (ASF) under one or more + contributor license agreements. See the NOTICE file distributed with + this work for additional information regarding copyright ownership. + The ASF licenses this file to You under the Apache License, Version 2.0 + (the "License"); you may not use this file except in compliance with + the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. + +--> +<document> + <properties> + <title>Commons Compress Examples</title> + <author email="d...@commons.apache.org">Commons Documentation Team</author> + </properties> + <body> + <section name="Examples"> + + <subsection name="Factories"> + + <p>Compress provides factory methods to create input/output + streams based on the names of the compressor or archiver + format as well as factory methods that try to guess the + format of an input stream.</p> + + <p>To create a compressor writing to a given output by using + the algorithm name:</p> + <source><![CDATA[ +CompressorOutputStream gzippedOut = new CompressorStreamFactory() + .createCompressorOutputStream("gz", myOutputStream); +]]></source> + + <p>Make the factory guess the input format for a given stream:</p> + <source><![CDATA[ +ArchiveInputStream input = new ArchiveStreamFactory() + .createArchiveInputStream(originalInput); +]]></source> + + </subsection> + + <subsection name="ar"> + + <p>In addition to the information stored + in <code>ArchiveEntry</code> a <code>ArArchiveEntry</code> + stores information about the owner user and group as well as + Unix permissions.</p> + + <p>Adding an entry to an ar archive:</p> +<source><![CDATA[ +ArArchiveEntry entry = new ArArchiveEntry(name, size); +arOutput.putNextEntry(entry); +arOutput.write(contentOfEntry); +arOutput.closeArchiveEntry(); +]]></source> + + <p>Reading entries from an ar archive:</p> +<source><![CDATA[ +ArArchiveEntry entry = (ArArchiveEntry) arInput.getNextEntry(); +byte[] content = new byte[entry.getSize()]; +LOOP UNTIL entry.getSize() HAS BEEN READ { + arInput(read, offset, content.length - offset); +} +]]></source> + + </subsection> + + <subsection name="cpio"> + + <p>In addition to the information stored + in <code>ArchiveEntry</code> a <code>CpioArchiveEntry</code> + stores various attributes including information about the + original owner and permissions.</p> + + <p>The cpio package supports the "new portable" as well as the + "old" format of CPIO archives in their binary, ASCII and + "with CRC" variants.</p> + + <p>Adding an entry to a cpio archive:</p> +<source><![CDATA[ +CpioArchiveEntry entry = new CpioArchiveEntry(name, size); +cpioOutput.putNextEntry(entry); +cpioOutput.write(contentOfEntry); +cpioOutput.closeArchiveEntry(); +]]></source> + + <p>Reading entries from an cpio archive:</p> +<source><![CDATA[ +CpioArchiveEntry entry = cpioInput.getNextCPIOEntry(); +byte[] content = new byte[entry.getSize()]; +LOOP UNTIL entry.getSize() HAS BEEN READ { + cpioInput(read, offset, content.length - offset); +} +]]></source> + + </subsection> + + <subsection name="tar"> + + <p>In addition to the information stored + in <code>ArchiveEntry</code> a <code>TarArchiveEntry</code> + stores various attributes including information about the + original owner and permissions.</p> + + <p>There are several different tar formats and the TAR package + of Compress 1.0 only provides the common functionality of + the existing variants.</p> + <p>The original format didn't support file names longer than + 100 characters and the tar package will fail if you try to + add an entry longer than that. + The <code>longFileMode</code> option + of <code>TarArchiveOutputStream</code> can be used to make + the archive truncate such names or use the GNU tar variant + of storing such names. If you choose the GNU tar option, + the archive can not be extracted using many other tar + implementations like the ones of OpenBSD, Solaris or MacOS + X.</p> + + <p><code>TarArchiveInputStream</code> will recognize the GNU + tar extension for long file names and read the longer names + accordingly.</p> + + <p>Adding an entry to a tar archive:</p> +<source><![CDATA[ +TarArchiveEntry entry = new TarArchiveEntry(name); +entry.setSize(size); +tarOutput.putNextEntry(entry); +tarOutput.write(contentOfEntry); +tarOutput.closeArchiveEntry(); +]]></source> + + <p>Reading entries from an tar archive:</p> +<source><![CDATA[ +TarArchiveEntry entry = tarInput.getNextTarEntry(); +byte[] content = new byte[entry.getSize()]; +LOOP UNTIL entry.getSize() HAS BEEN READ { + tarInput(read, offset, content.length - offset); +} +]]></source> + </subsection> + + <subsection name="zip"> + <p>The ZIP package has a <a href="zip.html">dedicated + documentation page</a>.</p> + + <p>Adding an entry to a zip archive:</p> +<source><![CDATA[ +ZipArchiveEntry entry = new ZipArchiveEntry(name); +entry.setSize(size); +zipOutput.putNextEntry(entry); +zipOutput.write(contentOfEntry); +zipOutput.closeArchiveEntry(); +]]></source> + + <p>Reading entries from an zip archive:</p> +<source><![CDATA[ +ZipArchiveEntry entry = zipInput.getNextZipEntry(); +byte[] content = new byte[entry.getSize()]; +LOOP UNTIL entry.getSize() HAS BEEN READ { + zipInput(read, offset, content.length - offset); +} +]]></source> + + <p>Reading entries from an zip archive using the + recommended <code>ZipFile</code> class:</p> +<source><![CDATA[ +ZipArchiveEntry entry = zipFile.getEntry(name); +InputStream content = zipFile.getInputStream(entry); +try { + READ UNTIL content IS EXHAUSTED +} finally { + content.close(); +} +]]></source> + </subsection> + + <subsection name="jar"> + <p>In general, JAR archives are ZIP files, so the JAR package + supports all options provided by the ZIP package.</p> + + <p>To be interoperable JAR archives should always be created + using the UTF-8 encoding for file names (which is the + default).</p> + + <p>Archives created using <code>JarArchiveOutputStream</code> + will implicitly add a <code>JarMarker</code> extra field to + the very first archive entry of the archive which will make + Solaris recognize them as Java archives and allows them to + be used as executables.</p> + + <p>Note that <code>ArchiveStreamFactory</code> doesn't + distinguish ZIP archives from JAR archives, so if you use + the one-argument <code>createArchiveInputStream</code> + method on a JAR archive, it will still return the more + generic <code>ZipArchiveInputStream</code>.</p> + + <p>The <code>JarArchiveEntry</code> class contains fields for + certificates and attributes that are planned to be supported + in the future but are not supported as of Compress 1.0.</p> + + <p>Adding an entry to a jar archive:</p> +<source><![CDATA[ +JarArchiveEntry entry = new JarArchiveEntry(name, size); +entry.setSize(size); +jarOutput.putNextEntry(entry); +jarOutput.write(contentOfEntry); +jarOutput.closeArchiveEntry(); +]]></source> + + <p>Reading entries from an jar archive:</p> +<source><![CDATA[ +JarArchiveEntry entry = jarInput.getNextJarEntry(); +byte[] content = new byte[entry.getSize()]; +LOOP UNTIL entry.getSize() HAS BEEN READ { + jarInput(read, offset, content.length - offset); +} +]]></source> + </subsection> + + <subsection name="bzip2"> + + <p>Note that <code>BZipCompressorOutputStream</code> keeps + hold of some big data structures in memory. While it is + true recommended for any stream that you close it as soon as + you no longer needed, this is even more important + for <code>BZipCompressorOutputStream</code>.</p> + + <p>Uncompressing a given bzip2 compressed file (you would + certainly add exception handling and make sure all streams + get closed properly):</p> +<source><![CDATA[ +FileInputStream in = new FileInputStream("archive.tar.bz2"); +FileOutputStream out = new FileOutputStream("archive.tar"); +BZip2CompressorInputStream bzIn = new BZip2CompressorInputStream(in); +final byte[] buffer = new byte[buffersize]; +int n = 0; +while (-1 != (n = bzIn.read(buffer))) { + out.write(buffer, 0, n); +} +out.close(); +bzIn.close(); +]]></source> + + </subsection> + + <subsection name="gzip"> + + <p>The implementation of this package is provided by + the <code>java.util.zip</code> package of the Java class + library.</p> + + <p>Uncompressing a given bzip2 compressed file (you would + certainly add exception handling and make sure all streams + get closed properly):</p> +<source><![CDATA[ +FileInputStream in = new FileInputStream("archive.tar.gz"); +FileOutputStream out = new FileOutputStream("archive.tar"); +GZipCompressorInputStream bzIn = new GZipCompressorInputStream(in); +final byte[] buffer = new byte[buffersize]; +int n = 0; +while (-1 != (n = bzIn.read(buffer))) { + out.write(buffer, 0, n); +} +out.close(); +bzIn.close(); +]]></source> + </subsection> + + </section> + </body> +</document> Propchange: commons/proper/compress/trunk/src/site/xdoc/examples.xml ------------------------------------------------------------------------------ svn:eol-style = native Modified: commons/proper/compress/trunk/src/site/xdoc/index.xml URL: http://svn.apache.org/viewvc/commons/proper/compress/trunk/src/site/xdoc/index.xml?rev=759472&r1=759471&r2=759472&view=diff ============================================================================== --- commons/proper/compress/trunk/src/site/xdoc/index.xml (original) +++ commons/proper/compress/trunk/src/site/xdoc/index.xml Sat Mar 28 14:46:32 2009 @@ -56,7 +56,34 @@ </subsection> </section> <section name="Documentation"> + <p>The compress component is split into <em>compressors</em> and + <em>archivers</em>. While <em>compressors</em> + (un)compress streams that usually store a single + entry, <em>archivers</em> deal with archives that contain + structured content represented + by <code>ArchiveEntry</code> instances which in turn + usually correspond to single files or directories.</p> + + <p>Currently the bzip2 and gzip formats are supported as + compressors where gzip support is provided by + the <code>java.util.zip</code> package of the Java class + library.</p> + + <p>The ar, cpio, tar and zip formats are supported as + archivers where the <a href="zip.html">zip</a> + implementation provides capabilities that go beyond the + features found in java.util.zip.</p> + + <p>The compress component provides abstract base classes for + compressors and archivers together with factories that can + be used to choose implementations by algorithm name. In + the case of input streams the factories can also be used + to guess the format and provide the matching + implementation.</p> + <ul> + <li>The <a href="examples.html">examples page</a> contains + more detailed information and some examples.</li> <li>The <a href="apidocs/index.html">Javadoc</a> of the latest SVN</li> <li>The <a href="http://svn.apache.org/viewvc/commons/proper/compress/">SVN repository</a> can be browsed.</li> Added: commons/proper/compress/trunk/src/site/xdoc/zip.xml URL: http://svn.apache.org/viewvc/commons/proper/compress/trunk/src/site/xdoc/zip.xml?rev=759472&view=auto ============================================================================== --- commons/proper/compress/trunk/src/site/xdoc/zip.xml (added) +++ commons/proper/compress/trunk/src/site/xdoc/zip.xml Sat Mar 28 14:46:32 2009 @@ -0,0 +1,226 @@ +<?xml version="1.0"?> +<!-- + + Licensed to the Apache Software Foundation (ASF) under one or more + contributor license agreements. See the NOTICE file distributed with + this work for additional information regarding copyright ownership. + The ASF licenses this file to You under the Apache License, Version 2.0 + (the "License"); you may not use this file except in compliance with + the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. + +--> +<document> + <properties> + <title>Commons Compress ZIP package</title> + <author email="d...@commons.apache.org">Commons Documentation Team</author> + </properties> + <body> + <section name="The ZIP package"> + + <p>The ZIP package provides features not found + in <code>java.util.zip</code>:</p> + + <ul> + <li>Support for encodings other than UTF-8 for filenames and + comments.</li> + <li>Access to internal and external attributes (which are used + to store Unix permission by some zip implementations).</li> + <li>Structured support for extra fields.</li> + </ul> + + <p>In addition to the information stored + in <code>ArchiveEntry</code> a <code>ZipArchiveEntry</code> + stores internal and external attributes as well as extra + fields which may contain information like Unix permissions, + information about the platform they've been created on, their + last modification time and an optional comment.</p> + + <subsection name="ZipArchiveInputStream vs ZipFile"> + + <p>ZIP archives store a archive entries in sequence and + contain a registry of all entries at the very end of the + archive. It is acceptable for an archive to contain several + entries of the same name and have the registry (called the + central directory) decide which entry is actually to be used + (if any).</p> + + <p>In addition the ZIP format stores certain information only + inside the central directory but not together with the entry + itself, this is:</p> + + <ul> + <li>internal and external attributes</li> + <li>different or additional extra fields</li> + </ul> + + <p>This means the ZIP format cannot really be parsed + correctly while reading a non-seekable stream, which is what + <code>ZipArchiveInputStream</code> is forced to do. As a + result <code>ZipArchiveInputStream</code></p> + <ul> + <li>may return entries that are not part of the central + directory at all and shouldn't be considered part of the + archive.</li> + <li>may return several entries with the same name.</li> + <li>will not return internal or external attributes.</li> + <li>may return incomplete extra field data.</li> + </ul> + + <p><code>ZipArchiveInputStream</code> shares these limitations + with <code>java.util.zip.ZipInputStream</code>.</p> + + <p><code>ZipFile</code> is able to read the central directory + first and provide correct and complete information on any + ZIP archive.</p> + + <p>If possible, you should always prefer <code>ZipFile</code> + over <code>ZipArchiveInputStream</code>.</p> + </subsection> + + <subsection name="Extra Fields"> + + <p>Inside a ZIP archive, additional data can be attached to + each entry. The <code>java.util.zip.ZipEntry</code> class + provides access to this via the <code>get/setExtra</code> + methods as arrays of <code>byte</code>s.</p> + + <p>Actually the extra data is supposed to be more structured + than that and Compress' ZIP package provides access to the + structured data as <code>ExtraField</code> instances. Only + a subset of all defined extra field formats is supported by + the package, any other extra field will be stored + as <code>UnrecognizedExtraField</code>.</p> + + </subsection> + + <subsection name="Encoding" id="encoding"> + + <p>Traditionally the ZIP archive format uses CodePage 437 as + encoding for file name, which is not sufficient for many + international character sets.</p> + + <p>Over time different archivers have chosen different ways to + work around the limitation - the <code>java.util.zip</code> + packages simply uses UTF-8 as its encoding for example.</p> + + <p>Ant has been offering the encoding attribute of the zip and + unzip task as a way to explicitly specify the encoding to + use (or expect) since Ant 1.4. It defaults to the + platform's default encoding for zip and UTF-8 for jar and + other jar-like tasks (war, ear, ...) as well as the unzip + family of tasks.</p> + + <p>More recent versions of the ZIP specification introduce + something called the "language encoding flag" + which can be used to signal that a file name has been + encoded using UTF-8. All ZIP-archives written by Compress + will set this flag, if the encoding has been set to UTF-8. + Our interoperability tests with existing archivers didn't + show any ill effects (in fact, most archivers ignore the + flag to date), but you can turn off the "language encoding + flag" by setting the attribute + <code>useLanguageEncodingFlag</code> to <code>false</code> on the + <code>ZipArchiveOutputStream</code> if you should encounter + problems.</p> + + <p>The <code>ZipFile</code> + and <code>ZipArchiveInputStream</code> classes will + recognize the language encoding flag and ignore the encoding + set in the constructor if it has been found.</p> + + <p>The InfoZIP developers have introduced new ZIP extra fields + that can be used to add an additional UTF-8 encoded file + name to the entry's metadata. Most archivers ignore these + extra fields. <code>ZipArchiveOutputStream</code> supports + an option <code>createUnicodeExtraFields</code> which makes + it write these extra fields either for all entries + ("always") or only those whose name cannot be encoded using + the specified encoding (not-encodeable), it defaults to + "never" since the extra fields create bigger archives.</p> + + <p>The fallbackToUTF8 attribute + of <code>ZipArchiveOutputStream</code> can be used to create + archives that use the specified encoding in the majority of + cases but UTF-8 and the language encoding flag for filenames + that cannot be encoded using the specified encoding.</p> + + <p>The <code>ZipFile</code> + and <code>ZipArchiveInputStream</code> classes recognize the + Unicode extra fields by default and read the file name + information from them, unless you set the constructor parameter + <code>scanForUnicodeExtraFields</code> to false.</p> + + <h4>Recommendations for Interoperability</h4> + + <p>The optimal setting of flags depends on the archivers you + expect as consumers/producers of the ZIP archives. Below + are some test results which may be superseded with later + versions of each tool.</p> + + <ul> + <li>The java.util.zip package used by the jar executable or + to read jars from your CLASSPATH reads and writes UTF-8 + names, it doesn't set or recognize any flags or Unicode + extra fields.</li> + + <li>7Zip writes CodePage 437 by default but uses UTF-8 and + the language encoding flag when writing entries that + cannot be encoded as CodePage 437 (similar to the zip task + with fallbacktoUTF8 set to true). It recognizes the + language encoding flag when reading and ignores the + Unicode extra fields.</li> + + <li>WinZIP writes CodePage 437 and uses Unicode extra fields + by default. It recognizes the Unicode extra field and the + language encoding flag when reading.</li> + + <li>Windows' "compressed folder" feature doesn't recognize + any flag or extra field and creates archives using the + platforms default encoding - and expects archives to be in + that encoding when reading them.</li> + + <li>InfoZIP based tools can recognize and write both, it is + a compile time option and depends on the platform so your + mileage may vary.</li> + + <li>PKWARE zip tools recognize both and prefer the language + encoding flag. They create archives using CodePage 437 if + possible and UTF-8 plus the language encoding flag for + file names that cannot be encoded as CodePage 437.</li> + </ul> + + <p>So, what to do?</p> + + <p>If you are creating jars, then java.util.zip is your main + consumer. We recommend you set the encoding to UTF-8 and + keep the language encoding flag enabled. The flag won't + help or hurt java.util.zip but archivers that support it + will show the correct file names.</p> + + <p>For maximum interop it is probably best to set the encoding + to UTF-8, enable the language encoding flag and create + Unicode extra fields when writing ZIPs. Such archives + should be extracted correctly by java.util.zip, 7Zip, + WinZIP, PKWARE tools and most likely InfoZIP tools. They + will be unusable with Windows' "compressed folders" feature + and bigger than archives without the Unicode extra fields, + though.</p> + + <p>If Windows' "compressed folders" is your primary consumer, + then your best option is to explicitly set the encoding to + the target platform. You may want to enable creation of + Unicode extra fields so the tools that support them will + extract the file names correctly.</p> + </subsection> + + </section> + </body> +</document> Propchange: commons/proper/compress/trunk/src/site/xdoc/zip.xml ------------------------------------------------------------------------------ svn:eol-style = native