zip.xml

bodewig Sat, 28 Mar 2009 07:46:58 -0700

Author: bodewig
Date: Sat Mar 28 14:46:32 2009
New Revision: 759472

URL: http://svn.apache.org/viewvc?rev=759472&view=rev
Log:
some more in depth documentation


Added:
    commons/proper/compress/trunk/src/site/xdoc/examples.xml   (with props)
    commons/proper/compress/trunk/src/site/xdoc/zip.xml   (with props)
Modified:
    commons/proper/compress/trunk/src/site/site.xml
    commons/proper/compress/trunk/src/site/xdoc/index.xml

Modified: commons/proper/compress/trunk/src/site/site.xml
URL: 
http://svn.apache.org/viewvc/commons/proper/compress/trunk/src/site/site.xml?rev=759472&r1=759471&r2=759472&view=diff
==============================================================================
--- commons/proper/compress/trunk/src/site/site.xml (original)
+++ commons/proper/compress/trunk/src/site/site.xml Sat Mar 28 14:46:32 2009
@@ -28,6 +28,7 @@
   <body>
     <menu name="Compress">
       <item name="Overview"    href="/index.html"/>
+      <item name="Examples"    href="/examples.html"/>
       <item name="Issue Tracking" href="/issue-tracking.html"/>
       <item name="Download"    href="/downloads.html"/>
       <item name="Wiki"        href="http://wiki.apache.org/commons/Compress"/>

Added: commons/proper/compress/trunk/src/site/xdoc/examples.xml
URL: 
http://svn.apache.org/viewvc/commons/proper/compress/trunk/src/site/xdoc/examples.xml?rev=759472&view=auto
==============================================================================
--- commons/proper/compress/trunk/src/site/xdoc/examples.xml (added)
+++ commons/proper/compress/trunk/src/site/xdoc/examples.xml Sat Mar 28 
14:46:32 2009
@@ -0,0 +1,279 @@
+<?xml version="1.0"?>
+<!--
+
+   Licensed to the Apache Software Foundation (ASF) under one or more
+   contributor license agreements.  See the NOTICE file distributed with
+   this work for additional information regarding copyright ownership.
+   The ASF licenses this file to You under the Apache License, Version 2.0
+   (the "License"); you may not use this file except in compliance with
+   the License.  You may obtain a copy of the License at
+
+       http://www.apache.org/licenses/LICENSE-2.0
+
+   Unless required by applicable law or agreed to in writing, software
+   distributed under the License is distributed on an "AS IS" BASIS,
+   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+   See the License for the specific language governing permissions and
+   limitations under the License.
+
+-->
+<document>
+  <properties>
+    <title>Commons Compress Examples</title>
+    <author email="d...@commons.apache.org">Commons Documentation Team</author>
+  </properties>
+  <body>
+    <section name="Examples">
+
+      <subsection name="Factories">
+
+        <p>Compress provides factory methods to create input/output
+          streams based on the names of the compressor or archiver
+          format as well as factory methods that try to guess the
+          format of an input stream.</p>
+
+        <p>To create a compressor writing to a given output by using
+          the algorithm name:</p>
+        <source><![CDATA[
+CompressorOutputStream gzippedOut = new CompressorStreamFactory()
+    .createCompressorOutputStream("gz", myOutputStream);
+]]></source>
+
+        <p>Make the factory guess the input format for a given stream:</p>
+        <source><![CDATA[
+ArchiveInputStream input = new ArchiveStreamFactory()
+    .createArchiveInputStream(originalInput);
+]]></source>
+
+      </subsection>
+
+      <subsection name="ar">
+
+        <p>In addition to the information stored
+          in <code>ArchiveEntry</code> a <code>ArArchiveEntry</code>
+          stores information about the owner user and group as well as
+          Unix permissions.</p>
+
+        <p>Adding an entry to an ar archive:</p>
+<source><![CDATA[
+ArArchiveEntry entry = new ArArchiveEntry(name, size);
+arOutput.putNextEntry(entry);
+arOutput.write(contentOfEntry);
+arOutput.closeArchiveEntry();
+]]></source>
+
+        <p>Reading entries from an ar archive:</p>
+<source><![CDATA[
+ArArchiveEntry entry = (ArArchiveEntry) arInput.getNextEntry();
+byte[] content = new byte[entry.getSize()];
+LOOP UNTIL entry.getSize() HAS BEEN READ {
+    arInput(read, offset, content.length - offset);
+}
+]]></source>
+
+      </subsection>
+
+      <subsection name="cpio">
+
+        <p>In addition to the information stored
+          in <code>ArchiveEntry</code> a <code>CpioArchiveEntry</code>
+          stores various attributes including information about the
+          original owner and permissions.</p>
+
+        <p>The cpio package supports the "new portable" as well as the
+          "old" format of CPIO archives in their binary, ASCII and
+          "with CRC" variants.</p>
+
+        <p>Adding an entry to a cpio archive:</p>
+<source><![CDATA[
+CpioArchiveEntry entry = new CpioArchiveEntry(name, size);
+cpioOutput.putNextEntry(entry);
+cpioOutput.write(contentOfEntry);
+cpioOutput.closeArchiveEntry();
+]]></source>
+
+        <p>Reading entries from an cpio archive:</p>
+<source><![CDATA[
+CpioArchiveEntry entry = cpioInput.getNextCPIOEntry();
+byte[] content = new byte[entry.getSize()];
+LOOP UNTIL entry.getSize() HAS BEEN READ {
+    cpioInput(read, offset, content.length - offset);
+}
+]]></source>
+
+      </subsection>
+
+      <subsection name="tar">
+
+        <p>In addition to the information stored
+          in <code>ArchiveEntry</code> a <code>TarArchiveEntry</code>
+          stores various attributes including information about the
+          original owner and permissions.</p>
+
+        <p>There are several different tar formats and the TAR package
+          of Compress 1.0 only provides the common functionality of
+          the existing variants.</p>
+        <p>The original format didn't support file names longer than
+          100 characters and the tar package will fail if you try to
+          add an entry longer than that.
+          The <code>longFileMode</code> option
+          of <code>TarArchiveOutputStream</code> can be used to make
+          the archive truncate such names or use the GNU tar variant
+          of storing such names.  If you choose the GNU tar option,
+          the archive can not be extracted using many other tar
+          implementations like the ones of OpenBSD, Solaris or MacOS
+          X.</p>
+
+        <p><code>TarArchiveInputStream</code> will recognize the GNU
+          tar extension for long file names and read the longer names
+          accordingly.</p>
+
+        <p>Adding an entry to a tar archive:</p>
+<source><![CDATA[
+TarArchiveEntry entry = new TarArchiveEntry(name);
+entry.setSize(size);
+tarOutput.putNextEntry(entry);
+tarOutput.write(contentOfEntry);
+tarOutput.closeArchiveEntry();
+]]></source>
+
+        <p>Reading entries from an tar archive:</p>
+<source><![CDATA[
+TarArchiveEntry entry = tarInput.getNextTarEntry();
+byte[] content = new byte[entry.getSize()];
+LOOP UNTIL entry.getSize() HAS BEEN READ {
+    tarInput(read, offset, content.length - offset);
+}
+]]></source>
+      </subsection>
+
+      <subsection name="zip">
+        <p>The ZIP package has a <a href="zip.html">dedicated
+            documentation page</a>.</p>
+
+        <p>Adding an entry to a zip archive:</p>
+<source><![CDATA[
+ZipArchiveEntry entry = new ZipArchiveEntry(name);
+entry.setSize(size);
+zipOutput.putNextEntry(entry);
+zipOutput.write(contentOfEntry);
+zipOutput.closeArchiveEntry();
+]]></source>
+
+        <p>Reading entries from an zip archive:</p>
+<source><![CDATA[
+ZipArchiveEntry entry = zipInput.getNextZipEntry();
+byte[] content = new byte[entry.getSize()];
+LOOP UNTIL entry.getSize() HAS BEEN READ {
+    zipInput(read, offset, content.length - offset);
+}
+]]></source>
+
+        <p>Reading entries from an zip archive using the
+          recommended <code>ZipFile</code> class:</p>
+<source><![CDATA[
+ZipArchiveEntry entry = zipFile.getEntry(name);
+InputStream content = zipFile.getInputStream(entry);
+try {
+    READ UNTIL content IS EXHAUSTED
+} finally {
+    content.close();
+}
+]]></source>
+      </subsection>
+
+      <subsection name="jar">
+        <p>In general, JAR archives are ZIP files, so the JAR package
+          supports all options provided by the ZIP package.</p>
+
+        <p>To be interoperable JAR archives should always be created
+          using the UTF-8 encoding for file names (which is the
+          default).</p>
+
+        <p>Archives created using <code>JarArchiveOutputStream</code>
+          will implicitly add a <code>JarMarker</code> extra field to
+          the very first archive entry of the archive which will make
+          Solaris recognize them as Java archives and allows them to
+          be used as executables.</p>
+
+        <p>Note that <code>ArchiveStreamFactory</code> doesn't
+          distinguish ZIP archives from JAR archives, so if you use
+          the one-argument <code>createArchiveInputStream</code>
+          method on a JAR archive, it will still return the more
+          generic <code>ZipArchiveInputStream</code>.</p>
+
+        <p>The <code>JarArchiveEntry</code> class contains fields for
+          certificates and attributes that are planned to be supported
+          in the future but are not supported as of Compress 1.0.</p>
+
+        <p>Adding an entry to a jar archive:</p>
+<source><![CDATA[
+JarArchiveEntry entry = new JarArchiveEntry(name, size);
+entry.setSize(size);
+jarOutput.putNextEntry(entry);
+jarOutput.write(contentOfEntry);
+jarOutput.closeArchiveEntry();
+]]></source>
+
+        <p>Reading entries from an jar archive:</p>
+<source><![CDATA[
+JarArchiveEntry entry = jarInput.getNextJarEntry();
+byte[] content = new byte[entry.getSize()];
+LOOP UNTIL entry.getSize() HAS BEEN READ {
+    jarInput(read, offset, content.length - offset);
+}
+]]></source>
+      </subsection>
+
+      <subsection name="bzip2">
+
+        <p>Note that <code>BZipCompressorOutputStream</code> keeps
+          hold of some big data structures in memory.  While it is
+          true recommended for any stream that you close it as soon as
+          you no longer needed, this is even more important
+          for <code>BZipCompressorOutputStream</code>.</p>
+
+        <p>Uncompressing a given bzip2 compressed file (you would
+          certainly add exception handling and make sure all streams
+          get closed properly):</p>
+<source><![CDATA[
+FileInputStream in = new FileInputStream("archive.tar.bz2");
+FileOutputStream out = new FileOutputStream("archive.tar");
+BZip2CompressorInputStream bzIn = new BZip2CompressorInputStream(in);
+final byte[] buffer = new byte[buffersize];
+int n = 0;
+while (-1 != (n = bzIn.read(buffer))) {
+    out.write(buffer, 0, n);
+}
+out.close();
+bzIn.close();
+]]></source>
+
+      </subsection>
+
+      <subsection name="gzip">
+
+        <p>The implementation of this package is provided by
+          the <code>java.util.zip</code> package of the Java class
+          library.</p>
+
+        <p>Uncompressing a given bzip2 compressed file (you would
+          certainly add exception handling and make sure all streams
+          get closed properly):</p>
+<source><![CDATA[
+FileInputStream in = new FileInputStream("archive.tar.gz");
+FileOutputStream out = new FileOutputStream("archive.tar");
+GZipCompressorInputStream bzIn = new GZipCompressorInputStream(in);
+final byte[] buffer = new byte[buffersize];
+int n = 0;
+while (-1 != (n = bzIn.read(buffer))) {
+    out.write(buffer, 0, n);
+}
+out.close();
+bzIn.close();
+]]></source>
+      </subsection>
+
+    </section>
+  </body>
+</document>

Propchange: commons/proper/compress/trunk/src/site/xdoc/examples.xml
------------------------------------------------------------------------------
    svn:eol-style = native

Modified: commons/proper/compress/trunk/src/site/xdoc/index.xml
URL: 
http://svn.apache.org/viewvc/commons/proper/compress/trunk/src/site/xdoc/index.xml?rev=759472&r1=759471&r2=759472&view=diff
==============================================================================
--- commons/proper/compress/trunk/src/site/xdoc/index.xml (original)
+++ commons/proper/compress/trunk/src/site/xdoc/index.xml Sat Mar 28 14:46:32 
2009
@@ -56,7 +56,34 @@
             </subsection>
         </section>
         <section name="Documentation">
+          <p>The compress component is split into <em>compressors</em> and
+            <em>archivers</em>.  While <em>compressors</em>
+            (un)compress streams that usually store a single
+            entry, <em>archivers</em> deal with archives that contain
+            structured content represented
+            by <code>ArchiveEntry</code> instances which in turn
+            usually correspond to single files or directories.</p>
+
+          <p>Currently the bzip2 and gzip formats are supported as
+            compressors where gzip support is provided by
+            the <code>java.util.zip</code> package of the Java class
+            library.</p>
+
+          <p>The ar, cpio, tar and zip formats are supported as
+            archivers where the <a href="zip.html">zip</a>
+            implementation provides capabilities that go beyond the
+            features found in java.util.zip.</p>
+
+          <p>The compress component provides abstract base classes for
+            compressors and archivers together with factories that can
+            be used to choose implementations by algorithm name.  In
+            the case of input streams the factories can also be used
+            to guess the format and provide the matching
+            implementation.</p>
+
           <ul>
+            <li>The <a href="examples.html">examples page</a> contains
+            more detailed information and some examples.</li>
             <li>The <a href="apidocs/index.html">Javadoc</a> of the latest 
SVN</li>
             <li>The <a 
href="http://svn.apache.org/viewvc/commons/proper/compress/";>SVN
                 repository</a> can be browsed.</li>

Added: commons/proper/compress/trunk/src/site/xdoc/zip.xml
URL: 
http://svn.apache.org/viewvc/commons/proper/compress/trunk/src/site/xdoc/zip.xml?rev=759472&view=auto
==============================================================================
--- commons/proper/compress/trunk/src/site/xdoc/zip.xml (added)
+++ commons/proper/compress/trunk/src/site/xdoc/zip.xml Sat Mar 28 14:46:32 2009
@@ -0,0 +1,226 @@
+<?xml version="1.0"?>
+<!--
+
+   Licensed to the Apache Software Foundation (ASF) under one or more
+   contributor license agreements.  See the NOTICE file distributed with
+   this work for additional information regarding copyright ownership.
+   The ASF licenses this file to You under the Apache License, Version 2.0
+   (the "License"); you may not use this file except in compliance with
+   the License.  You may obtain a copy of the License at
+
+       http://www.apache.org/licenses/LICENSE-2.0
+
+   Unless required by applicable law or agreed to in writing, software
+   distributed under the License is distributed on an "AS IS" BASIS,
+   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+   See the License for the specific language governing permissions and
+   limitations under the License.
+
+-->
+<document>
+  <properties>
+    <title>Commons Compress ZIP package</title>
+    <author email="d...@commons.apache.org">Commons Documentation Team</author>
+  </properties>
+  <body>
+    <section name="The ZIP package">
+
+      <p>The ZIP package provides features not found
+        in <code>java.util.zip</code>:</p>
+
+      <ul>
+        <li>Support for encodings other than UTF-8 for filenames and
+          comments.</li>
+        <li>Access to internal and external attributes (which are used
+          to store Unix permission by some zip implementations).</li>
+        <li>Structured support for extra fields.</li>
+      </ul>
+
+      <p>In addition to the information stored
+        in <code>ArchiveEntry</code> a <code>ZipArchiveEntry</code>
+        stores internal and external attributes as well as extra
+        fields which may contain information like Unix permissions,
+        information about the platform they've been created on, their
+        last modification time and an optional comment.</p>
+
+      <subsection name="ZipArchiveInputStream vs ZipFile">
+
+        <p>ZIP archives store a archive entries in sequence and
+          contain a registry of all entries at the very end of the
+          archive.  It is acceptable for an archive to contain several
+          entries of the same name and have the registry (called the
+          central directory) decide which entry is actually to be used
+          (if any).</p>
+
+        <p>In addition the ZIP format stores certain information only
+          inside the central directory but not together with the entry
+          itself, this is:</p>
+
+        <ul>
+          <li>internal and external attributes</li>
+          <li>different or additional extra fields</li>
+        </ul>
+
+        <p>This means the ZIP format cannot really be parsed
+          correctly while reading a non-seekable stream, which is what
+          <code>ZipArchiveInputStream</code> is forced to do.  As a
+          result <code>ZipArchiveInputStream</code></p>
+        <ul>
+          <li>may return entries that are not part of the central
+            directory at all and shouldn't be considered part of the
+            archive.</li>
+          <li>may return several entries with the same name.</li>
+          <li>will not return internal or external attributes.</li>
+          <li>may return incomplete extra field data.</li>
+        </ul>
+
+        <p><code>ZipArchiveInputStream</code> shares these limitations
+          with <code>java.util.zip.ZipInputStream</code>.</p>
+
+        <p><code>ZipFile</code> is able to read the central directory
+          first and provide correct and complete information on any
+          ZIP archive.</p>
+
+        <p>If possible, you should always prefer <code>ZipFile</code>
+          over <code>ZipArchiveInputStream</code>.</p>
+      </subsection>
+
+      <subsection name="Extra Fields">
+
+        <p>Inside a ZIP archive, additional data can be attached to
+          each entry.  The <code>java.util.zip.ZipEntry</code> class
+          provides access to this via the <code>get/setExtra</code>
+          methods as arrays of <code>byte</code>s.</p>
+
+        <p>Actually the extra data is supposed to be more structured
+          than that and Compress' ZIP package provides access to the
+          structured data as <code>ExtraField</code> instances.  Only
+          a subset of all defined extra field formats is supported by
+          the package, any other extra field will be stored
+          as <code>UnrecognizedExtraField</code>.</p>
+
+      </subsection>
+
+      <subsection name="Encoding" id="encoding">
+
+        <p>Traditionally the ZIP archive format uses CodePage 437 as
+          encoding for file name, which is not sufficient for many
+          international character sets.</p>
+
+        <p>Over time different archivers have chosen different ways to
+          work around the limitation - the <code>java.util.zip</code>
+          packages simply uses UTF-8 as its encoding for example.</p>
+
+        <p>Ant has been offering the encoding attribute of the zip and
+          unzip task as a way to explicitly specify the encoding to
+          use (or expect) since Ant 1.4.  It defaults to the
+          platform's default encoding for zip and UTF-8 for jar and
+          other jar-like tasks (war, ear, ...) as well as the unzip
+          family of tasks.</p>
+
+        <p>More recent versions of the ZIP specification introduce
+          something called the &quot;language encoding flag&quot;
+          which can be used to signal that a file name has been
+          encoded using UTF-8.  All ZIP-archives written by Compress
+          will set this flag, if the encoding has been set to UTF-8.
+          Our interoperability tests with existing archivers didn't
+          show any ill effects (in fact, most archivers ignore the
+          flag to date), but you can turn off the "language encoding
+          flag" by setting the attribute
+          <code>useLanguageEncodingFlag</code> to <code>false</code> on the
+          <code>ZipArchiveOutputStream</code> if you should encounter
+          problems.</p>
+
+        <p>The <code>ZipFile</code>
+          and <code>ZipArchiveInputStream</code> classes will
+          recognize the language encoding flag and ignore the encoding
+          set in the constructor if it has been found.</p>
+
+        <p>The InfoZIP developers have introduced new ZIP extra fields
+          that can be used to add an additional UTF-8 encoded file
+          name to the entry's metadata.  Most archivers ignore these
+          extra fields.  <code>ZipArchiveOutputStream</code> supports
+          an option <code>createUnicodeExtraFields</code> which makes
+          it write these extra fields either for all entries
+          ("always") or only those whose name cannot be encoded using
+          the specified encoding (not-encodeable), it defaults to
+          "never" since the extra fields create bigger archives.</p>
+
+        <p>The fallbackToUTF8 attribute
+          of <code>ZipArchiveOutputStream</code> can be used to create
+          archives that use the specified encoding in the majority of
+          cases but UTF-8 and the language encoding flag for filenames
+          that cannot be encoded using the specified encoding.</p>
+
+        <p>The <code>ZipFile</code>
+          and <code>ZipArchiveInputStream</code> classes recognize the
+          Unicode extra fields by default and read the file name
+          information from them, unless you set the constructor parameter
+          <code>scanForUnicodeExtraFields</code> to false.</p>
+
+        <h4>Recommendations for Interoperability</h4>
+
+        <p>The optimal setting of flags depends on the archivers you
+          expect as consumers/producers of the ZIP archives.  Below
+          are some test results which may be superseded with later
+          versions of each tool.</p>
+
+        <ul>
+          <li>The java.util.zip package used by the jar executable or
+            to read jars from your CLASSPATH reads and writes UTF-8
+            names, it doesn't set or recognize any flags or Unicode
+            extra fields.</li>
+
+          <li>7Zip writes CodePage 437 by default but uses UTF-8 and
+            the language encoding flag when writing entries that
+            cannot be encoded as CodePage 437 (similar to the zip task
+            with fallbacktoUTF8 set to true).  It recognizes the
+            language encoding flag when reading and ignores the
+            Unicode extra fields.</li>
+
+          <li>WinZIP writes CodePage 437 and uses Unicode extra fields
+            by default.  It recognizes the Unicode extra field and the
+            language encoding flag when reading.</li>
+
+          <li>Windows' "compressed folder" feature doesn't recognize
+            any flag or extra field and creates archives using the
+            platforms default encoding - and expects archives to be in
+            that encoding when reading them.</li>
+
+          <li>InfoZIP based tools can recognize and write both, it is
+            a compile time option and depends on the platform so your
+            mileage may vary.</li>
+
+          <li>PKWARE zip tools recognize both and prefer the language
+            encoding flag.  They create archives using CodePage 437 if
+            possible and UTF-8 plus the language encoding flag for
+            file names that cannot be encoded as CodePage 437.</li>
+        </ul>
+        
+        <p>So, what to do?</p>
+
+        <p>If you are creating jars, then java.util.zip is your main
+          consumer.  We recommend you set the encoding to UTF-8 and
+          keep the language encoding flag enabled.  The flag won't
+          help or hurt java.util.zip but archivers that support it
+          will show the correct file names.</p>
+
+        <p>For maximum interop it is probably best to set the encoding
+          to UTF-8, enable the language encoding flag and create
+          Unicode extra fields when writing ZIPs.  Such archives
+          should be extracted correctly by java.util.zip, 7Zip,
+          WinZIP, PKWARE tools and most likely InfoZIP tools.  They
+          will be unusable with Windows' "compressed folders" feature
+          and bigger than archives without the Unicode extra fields,
+          though.</p>
+
+        <p>If Windows' "compressed folders" is your primary consumer,
+          then your best option is to explicitly set the encoding to
+          the target platform.  You may want to enable creation of
+          Unicode extra fields so the tools that support them will
+          extract the file names correctly.</p>
+      </subsection>
+
+    </section>
+  </body>
+</document>

Propchange: commons/proper/compress/trunk/src/site/xdoc/zip.xml
------------------------------------------------------------------------------
    svn:eol-style = native

svn commit: r759472 - in /commons/proper/compress/trunk/src/site: site.xml xdoc/examples.xml xdoc/index.xml xdoc/zip.xml

Reply via email to