Author: gabor Date: Wed May 30 11:01:59 2018 New Revision: 1832532 URL: http://svn.apache.org/viewvc?rev=1832532&view=rev Log: PARQUET-1244: Documentation link to logical types broken
Author: Nandor Kollar <[email protected]> Modified: parquet/site/publish/adopters/index.html parquet/site/publish/bylaws/index.html parquet/site/publish/community/index.html parquet/site/publish/contribute/index.html parquet/site/publish/developers/index.html parquet/site/publish/documentation/how-to-release/index.html parquet/site/publish/documentation/latest/index.html parquet/site/publish/downloads/index.html parquet/site/publish/index.html parquet/site/publish/presentations/index.html parquet/site/source/_footer.md.erb parquet/site/source/_header.md.erb parquet/site/source/community.html.md parquet/site/source/documentation/latest.html.md Modified: parquet/site/publish/adopters/index.html URL: http://svn.apache.org/viewvc/parquet/site/publish/adopters/index.html?rev=1832532&r1=1832531&r2=1832532&view=diff ============================================================================== --- parquet/site/publish/adopters/index.html (original) +++ parquet/site/publish/adopters/index.html Wed May 30 11:01:59 2018 @@ -51,12 +51,11 @@ <a href="/community">Get Involved <span class="caret"></span></a> <ul class="dropdown-menu" role="menu" aria-labelledby="drop1"> <li role="presentation"><a role="menuitem" tabindex="-1" href="http://issues.apache.org/jira/browse/parquet"><i class="fa fa-bug"></i> JIRA (Bugs)</a></li> - <li role="presentation"><a role="menuitem" tabindex="-1" href="https://git-wip-us.apache.org/repos/asf?s=parquet"><i class="fa fa-code"></i> Source (Apache)</a></li> + <li role="presentation"><a role="menuitem" tabindex="-1" href="https://gitbox.apache.org/repos/asf?p=parquet-mr.git"><i class="fa fa-code"></i> Source (Apache)</a></li> <li role="presentation"><a role="menuitem" tabindex="-1" href="https://github.com/apache/parquet-mr"><i class="fa fa-github-alt"></i> Source (GitHub)</a></li> <li role="presentation"><a role="menuitem" tabindex="-1" href="/contribute"><i class="fa fa-code-fork"></i> Contributing</a></li> <li role="presentation"><a role="menuitem" tabindex="-1" href="https://twitter.com/ApacheParquet"><i class="fa fa-twitter"></i> @ApacheParquet</a></li> <li role="presentation"><a role="menuitem" tabindex="-1" href="http://stackoverflow.com/questions/tagged/parquet"><i class="fa fa-stack-overflow"></i> StackOverflow</a></li> - <li role="presentation"><a role="menuitem" tabindex="-1" href="https://plus.google.com/events/c36apc97f7invko9p128hq9e6b8"><i class="fa fa-google"></i> Google Hangout</a></li> </ul> </li> </ul> @@ -158,7 +157,7 @@ <div class="row-fluid"> <div class="span12 text-left"> <div class="span12"> - Copyright 2014 <a href="http://www.apache.org/">Apache Software Foundation</a>. Licensed under the <a href="http://www.apache.org/licenses/">Apache License v2.0</a>. Apache Parquet and the Apache feather logo are trademarks of The Apache Software Foundation. + Copyright 2018 <a href="http://www.apache.org/">Apache Software Foundation</a>. Licensed under the <a href="http://www.apache.org/licenses/">Apache License v2.0</a>. Apache Parquet and the Apache feather logo are trademarks of The Apache Software Foundation. </div> </div> Modified: parquet/site/publish/bylaws/index.html URL: http://svn.apache.org/viewvc/parquet/site/publish/bylaws/index.html?rev=1832532&r1=1832531&r2=1832532&view=diff ============================================================================== --- parquet/site/publish/bylaws/index.html (original) +++ parquet/site/publish/bylaws/index.html Wed May 30 11:01:59 2018 @@ -51,12 +51,11 @@ <a href="/community">Get Involved <span class="caret"></span></a> <ul class="dropdown-menu" role="menu" aria-labelledby="drop1"> <li role="presentation"><a role="menuitem" tabindex="-1" href="http://issues.apache.org/jira/browse/parquet"><i class="fa fa-bug"></i> JIRA (Bugs)</a></li> - <li role="presentation"><a role="menuitem" tabindex="-1" href="https://git-wip-us.apache.org/repos/asf?s=parquet"><i class="fa fa-code"></i> Source (Apache)</a></li> + <li role="presentation"><a role="menuitem" tabindex="-1" href="https://gitbox.apache.org/repos/asf?p=parquet-mr.git"><i class="fa fa-code"></i> Source (Apache)</a></li> <li role="presentation"><a role="menuitem" tabindex="-1" href="https://github.com/apache/parquet-mr"><i class="fa fa-github-alt"></i> Source (GitHub)</a></li> <li role="presentation"><a role="menuitem" tabindex="-1" href="/contribute"><i class="fa fa-code-fork"></i> Contributing</a></li> <li role="presentation"><a role="menuitem" tabindex="-1" href="https://twitter.com/ApacheParquet"><i class="fa fa-twitter"></i> @ApacheParquet</a></li> <li role="presentation"><a role="menuitem" tabindex="-1" href="http://stackoverflow.com/questions/tagged/parquet"><i class="fa fa-stack-overflow"></i> StackOverflow</a></li> - <li role="presentation"><a role="menuitem" tabindex="-1" href="https://plus.google.com/events/c36apc97f7invko9p128hq9e6b8"><i class="fa fa-google"></i> Google Hangout</a></li> </ul> </li> </ul> @@ -132,7 +131,7 @@ <div class="row-fluid"> <div class="span12 text-left"> <div class="span12"> - Copyright 2014 <a href="http://www.apache.org/">Apache Software Foundation</a>. Licensed under the <a href="http://www.apache.org/licenses/">Apache License v2.0</a>. Apache Parquet and the Apache feather logo are trademarks of The Apache Software Foundation. + Copyright 2018 <a href="http://www.apache.org/">Apache Software Foundation</a>. Licensed under the <a href="http://www.apache.org/licenses/">Apache License v2.0</a>. Apache Parquet and the Apache feather logo are trademarks of The Apache Software Foundation. </div> </div> Modified: parquet/site/publish/community/index.html URL: http://svn.apache.org/viewvc/parquet/site/publish/community/index.html?rev=1832532&r1=1832531&r2=1832532&view=diff ============================================================================== --- parquet/site/publish/community/index.html (original) +++ parquet/site/publish/community/index.html Wed May 30 11:01:59 2018 @@ -51,12 +51,11 @@ <a href="/community">Get Involved <span class="caret"></span></a> <ul class="dropdown-menu" role="menu" aria-labelledby="drop1"> <li role="presentation"><a role="menuitem" tabindex="-1" href="http://issues.apache.org/jira/browse/parquet"><i class="fa fa-bug"></i> JIRA (Bugs)</a></li> - <li role="presentation"><a role="menuitem" tabindex="-1" href="https://git-wip-us.apache.org/repos/asf?s=parquet"><i class="fa fa-code"></i> Source (Apache)</a></li> + <li role="presentation"><a role="menuitem" tabindex="-1" href="https://gitbox.apache.org/repos/asf?p=parquet-mr.git"><i class="fa fa-code"></i> Source (Apache)</a></li> <li role="presentation"><a role="menuitem" tabindex="-1" href="https://github.com/apache/parquet-mr"><i class="fa fa-github-alt"></i> Source (GitHub)</a></li> <li role="presentation"><a role="menuitem" tabindex="-1" href="/contribute"><i class="fa fa-code-fork"></i> Contributing</a></li> <li role="presentation"><a role="menuitem" tabindex="-1" href="https://twitter.com/ApacheParquet"><i class="fa fa-twitter"></i> @ApacheParquet</a></li> <li role="presentation"><a role="menuitem" tabindex="-1" href="http://stackoverflow.com/questions/tagged/parquet"><i class="fa fa-stack-overflow"></i> StackOverflow</a></li> - <li role="presentation"><a role="menuitem" tabindex="-1" href="https://plus.google.com/events/c36apc97f7invko9p128hq9e6b8"><i class="fa fa-google"></i> Google Hangout</a></li> </ul> </li> </ul> @@ -135,7 +134,7 @@ <h4 name="contribute">Contribute a core patch</h4> <p>Follow our <a href="/contribute/">contribution guidelines</a> when submitting a patch.</p> <h3>Stay in Touch</h3> - <p>Attend a Parquet <a href="https://plus.google.com/events/c36apc97f7invko9p128hq9e6b8">sync up meeting</a> via Google Hangouts.</p> + <p>Attend a Parquet sync up meeting via Google Hangouts. These meetings are hold ad-hoc, announced on the developer list about a week before the meeting.</p> <h3>Bylaws</h3> <p><a href="/bylaws/">Parquet community bylaws</a></p> @@ -159,7 +158,7 @@ <div class="row-fluid"> <div class="span12 text-left"> <div class="span12"> - Copyright 2014 <a href="http://www.apache.org/">Apache Software Foundation</a>. Licensed under the <a href="http://www.apache.org/licenses/">Apache License v2.0</a>. Apache Parquet and the Apache feather logo are trademarks of The Apache Software Foundation. + Copyright 2018 <a href="http://www.apache.org/">Apache Software Foundation</a>. Licensed under the <a href="http://www.apache.org/licenses/">Apache License v2.0</a>. Apache Parquet and the Apache feather logo are trademarks of The Apache Software Foundation. </div> </div> Modified: parquet/site/publish/contribute/index.html URL: http://svn.apache.org/viewvc/parquet/site/publish/contribute/index.html?rev=1832532&r1=1832531&r2=1832532&view=diff ============================================================================== --- parquet/site/publish/contribute/index.html (original) +++ parquet/site/publish/contribute/index.html Wed May 30 11:01:59 2018 @@ -51,12 +51,11 @@ <a href="/community">Get Involved <span class="caret"></span></a> <ul class="dropdown-menu" role="menu" aria-labelledby="drop1"> <li role="presentation"><a role="menuitem" tabindex="-1" href="http://issues.apache.org/jira/browse/parquet"><i class="fa fa-bug"></i> JIRA (Bugs)</a></li> - <li role="presentation"><a role="menuitem" tabindex="-1" href="https://git-wip-us.apache.org/repos/asf?s=parquet"><i class="fa fa-code"></i> Source (Apache)</a></li> + <li role="presentation"><a role="menuitem" tabindex="-1" href="https://gitbox.apache.org/repos/asf?p=parquet-mr.git"><i class="fa fa-code"></i> Source (Apache)</a></li> <li role="presentation"><a role="menuitem" tabindex="-1" href="https://github.com/apache/parquet-mr"><i class="fa fa-github-alt"></i> Source (GitHub)</a></li> <li role="presentation"><a role="menuitem" tabindex="-1" href="/contribute"><i class="fa fa-code-fork"></i> Contributing</a></li> <li role="presentation"><a role="menuitem" tabindex="-1" href="https://twitter.com/ApacheParquet"><i class="fa fa-twitter"></i> @ApacheParquet</a></li> <li role="presentation"><a role="menuitem" tabindex="-1" href="http://stackoverflow.com/questions/tagged/parquet"><i class="fa fa-stack-overflow"></i> StackOverflow</a></li> - <li role="presentation"><a role="menuitem" tabindex="-1" href="https://plus.google.com/events/c36apc97f7invko9p128hq9e6b8"><i class="fa fa-google"></i> Google Hangout</a></li> </ul> </li> </ul> @@ -238,7 +237,7 @@ the live site. </p> <div class="row-fluid"> <div class="span12 text-left"> <div class="span12"> - Copyright 2014 <a href="http://www.apache.org/">Apache Software Foundation</a>. Licensed under the <a href="http://www.apache.org/licenses/">Apache License v2.0</a>. Apache Parquet and the Apache feather logo are trademarks of The Apache Software Foundation. + Copyright 2018 <a href="http://www.apache.org/">Apache Software Foundation</a>. Licensed under the <a href="http://www.apache.org/licenses/">Apache License v2.0</a>. Apache Parquet and the Apache feather logo are trademarks of The Apache Software Foundation. </div> </div> Modified: parquet/site/publish/developers/index.html URL: http://svn.apache.org/viewvc/parquet/site/publish/developers/index.html?rev=1832532&r1=1832531&r2=1832532&view=diff ============================================================================== --- parquet/site/publish/developers/index.html (original) +++ parquet/site/publish/developers/index.html Wed May 30 11:01:59 2018 @@ -51,12 +51,11 @@ <a href="/community">Get Involved <span class="caret"></span></a> <ul class="dropdown-menu" role="menu" aria-labelledby="drop1"> <li role="presentation"><a role="menuitem" tabindex="-1" href="http://issues.apache.org/jira/browse/parquet"><i class="fa fa-bug"></i> JIRA (Bugs)</a></li> - <li role="presentation"><a role="menuitem" tabindex="-1" href="https://git-wip-us.apache.org/repos/asf?s=parquet"><i class="fa fa-code"></i> Source (Apache)</a></li> + <li role="presentation"><a role="menuitem" tabindex="-1" href="https://gitbox.apache.org/repos/asf?p=parquet-mr.git"><i class="fa fa-code"></i> Source (Apache)</a></li> <li role="presentation"><a role="menuitem" tabindex="-1" href="https://github.com/apache/parquet-mr"><i class="fa fa-github-alt"></i> Source (GitHub)</a></li> <li role="presentation"><a role="menuitem" tabindex="-1" href="/contribute"><i class="fa fa-code-fork"></i> Contributing</a></li> <li role="presentation"><a role="menuitem" tabindex="-1" href="https://twitter.com/ApacheParquet"><i class="fa fa-twitter"></i> @ApacheParquet</a></li> <li role="presentation"><a role="menuitem" tabindex="-1" href="http://stackoverflow.com/questions/tagged/parquet"><i class="fa fa-stack-overflow"></i> StackOverflow</a></li> - <li role="presentation"><a role="menuitem" tabindex="-1" href="https://plus.google.com/events/c36apc97f7invko9p128hq9e6b8"><i class="fa fa-google"></i> Google Hangout</a></li> </ul> </li> </ul> @@ -137,7 +136,7 @@ <div class="row-fluid"> <div class="span12 text-left"> <div class="span12"> - Copyright 2014 <a href="http://www.apache.org/">Apache Software Foundation</a>. Licensed under the <a href="http://www.apache.org/licenses/">Apache License v2.0</a>. Apache Parquet and the Apache feather logo are trademarks of The Apache Software Foundation. + Copyright 2018 <a href="http://www.apache.org/">Apache Software Foundation</a>. Licensed under the <a href="http://www.apache.org/licenses/">Apache License v2.0</a>. Apache Parquet and the Apache feather logo are trademarks of The Apache Software Foundation. </div> </div> Modified: parquet/site/publish/documentation/how-to-release/index.html URL: http://svn.apache.org/viewvc/parquet/site/publish/documentation/how-to-release/index.html?rev=1832532&r1=1832531&r2=1832532&view=diff ============================================================================== --- parquet/site/publish/documentation/how-to-release/index.html (original) +++ parquet/site/publish/documentation/how-to-release/index.html Wed May 30 11:01:59 2018 @@ -51,12 +51,11 @@ <a href="/community">Get Involved <span class="caret"></span></a> <ul class="dropdown-menu" role="menu" aria-labelledby="drop1"> <li role="presentation"><a role="menuitem" tabindex="-1" href="http://issues.apache.org/jira/browse/parquet"><i class="fa fa-bug"></i> JIRA (Bugs)</a></li> - <li role="presentation"><a role="menuitem" tabindex="-1" href="https://git-wip-us.apache.org/repos/asf?s=parquet"><i class="fa fa-code"></i> Source (Apache)</a></li> + <li role="presentation"><a role="menuitem" tabindex="-1" href="https://gitbox.apache.org/repos/asf?p=parquet-mr.git"><i class="fa fa-code"></i> Source (Apache)</a></li> <li role="presentation"><a role="menuitem" tabindex="-1" href="https://github.com/apache/parquet-mr"><i class="fa fa-github-alt"></i> Source (GitHub)</a></li> <li role="presentation"><a role="menuitem" tabindex="-1" href="/contribute"><i class="fa fa-code-fork"></i> Contributing</a></li> <li role="presentation"><a role="menuitem" tabindex="-1" href="https://twitter.com/ApacheParquet"><i class="fa fa-twitter"></i> @ApacheParquet</a></li> <li role="presentation"><a role="menuitem" tabindex="-1" href="http://stackoverflow.com/questions/tagged/parquet"><i class="fa fa-stack-overflow"></i> StackOverflow</a></li> - <li role="presentation"><a role="menuitem" tabindex="-1" href="https://plus.google.com/events/c36apc97f7invko9p128hq9e6b8"><i class="fa fa-google"></i> Google Hangout</a></li> </ul> </li> </ul> @@ -271,7 +270,7 @@ git push apache :apache-parquet-<VERS <div class="row-fluid"> <div class="span12 text-left"> <div class="span12"> - Copyright 2014 <a href="http://www.apache.org/">Apache Software Foundation</a>. Licensed under the <a href="http://www.apache.org/licenses/">Apache License v2.0</a>. Apache Parquet and the Apache feather logo are trademarks of The Apache Software Foundation. + Copyright 2018 <a href="http://www.apache.org/">Apache Software Foundation</a>. Licensed under the <a href="http://www.apache.org/licenses/">Apache License v2.0</a>. Apache Parquet and the Apache feather logo are trademarks of The Apache Software Foundation. </div> </div> Modified: parquet/site/publish/documentation/latest/index.html URL: http://svn.apache.org/viewvc/parquet/site/publish/documentation/latest/index.html?rev=1832532&r1=1832531&r2=1832532&view=diff ============================================================================== --- parquet/site/publish/documentation/latest/index.html (original) +++ parquet/site/publish/documentation/latest/index.html Wed May 30 11:01:59 2018 @@ -51,12 +51,11 @@ <a href="/community">Get Involved <span class="caret"></span></a> <ul class="dropdown-menu" role="menu" aria-labelledby="drop1"> <li role="presentation"><a role="menuitem" tabindex="-1" href="http://issues.apache.org/jira/browse/parquet"><i class="fa fa-bug"></i> JIRA (Bugs)</a></li> - <li role="presentation"><a role="menuitem" tabindex="-1" href="https://git-wip-us.apache.org/repos/asf?s=parquet"><i class="fa fa-code"></i> Source (Apache)</a></li> + <li role="presentation"><a role="menuitem" tabindex="-1" href="https://gitbox.apache.org/repos/asf?p=parquet-mr.git"><i class="fa fa-code"></i> Source (Apache)</a></li> <li role="presentation"><a role="menuitem" tabindex="-1" href="https://github.com/apache/parquet-mr"><i class="fa fa-github-alt"></i> Source (GitHub)</a></li> <li role="presentation"><a role="menuitem" tabindex="-1" href="/contribute"><i class="fa fa-code-fork"></i> Contributing</a></li> <li role="presentation"><a role="menuitem" tabindex="-1" href="https://twitter.com/ApacheParquet"><i class="fa fa-twitter"></i> @ApacheParquet</a></li> <li role="presentation"><a role="menuitem" tabindex="-1" href="http://stackoverflow.com/questions/tagged/parquet"><i class="fa fa-stack-overflow"></i> StackOverflow</a></li> - <li role="presentation"><a role="menuitem" tabindex="-1" href="https://plus.google.com/events/c36apc97f7invko9p128hq9e6b8"><i class="fa fa-google"></i> Google Hangout</a></li> </ul> </li> </ul> @@ -119,7 +118,7 @@ <p>We created Parquet to make the advantages of compressed, efficient columnar data representation available to any project in the Hadoop ecosystem.</p> -<p>Parquet is built from the ground up with complex nested data structures in mind, and uses the <a href="https://github.com/Parquet/parquet-mr/wiki/The-striping-and-assembly-algorithms-from-the-Dremel-paper">record shredding and assembly algorithm</a> described in the Dremel paper. We believe this approach is superior to simple flattening of nested name spaces.</p> +<p>Parquet is built from the ground up with complex nested data structures in mind, and uses the <a href="https://github.com/julienledem/redelm/wiki/The-striping-and-assembly-algorithms-from-the-Dremel-paper">record shredding and assembly algorithm</a> described in the Dremel paper. We believe this approach is superior to simple flattening of nested name spaces.</p> <p>Parquet is built to support very efficient compression and encoding schemes. Multiple projects have demonstrated the performance impact of applying the right compression and encoding scheme to the data. Parquet allows compression schemes to be specified on a per-column level, and is future-proofed to allow adding more encodings as they are invented and implemented.</p> @@ -127,11 +126,15 @@ <h2 id="modules">Modules</h2> -<p>The <code>parquet-format</code> project contains format specifications and Thrift definitions of metadata required to properly read Parquet files.</p> +<p>The <a href="https://github.com/apache/parquet-format">parquet-format</a> project contains format specifications and Thrift definitions of metadata required to properly read Parquet files.</p> -<p>The <code>parquet-mr</code> project contains multiple sub-modules, which implement the core components of reading and writing a nested, column-oriented data stream, map this core onto the parquet format, and provide Hadoop Input/Output Formats, Pig loaders, and other java-based utilities for interacting with Parquet.</p> +<p>The <a href="https://github.com/apache/parquet-mr">parquet-mr</a> project contains multiple sub-modules, which implement the core components of reading and writing a nested, column-oriented data stream, map this core onto the parquet format, and provide Hadoop Input/Output Formats, Pig loaders, and other Java-based utilities for interacting with Parquet.</p> -<p>The <code>parquet-compatibility</code> project contains compatibility tests that can be used to verify that implementations in different languages can read and write each other’s files.</p> +<p>The <a href="https://github.com/apache/parquet-cpp">parquet-cpp</a> project is a C++ library to read-write Parquet files.</p> + +<p>The <a href="https://github.com/sunchao/parquet-rs">parquet-rs</a> project is a Rust library to read-write Parquet files.</p> + +<p>The <a href="https://github.com/Parquet/parquet-compatibility">parquet-compatibility</a> project contains compatibility tests that can be used to verify that implementations in different languages can read and write each other’s files.</p> <h2 id="building">Building</h2> @@ -206,29 +209,32 @@ in the thrift files.</p> <p>Readers are expected to first read the file metadata to find all the column chunks they are interested in. The columns chunks should then be read sequentially.</p> -<p><img alt="File Layout" src="https://raw.github.com/Parquet/parquet-format/master/doc/images/FileLayout.gif" /></p> +<p><img alt="File Layout" src="https://raw.github.com/apache/parquet-format/master/doc/images/FileLayout.gif" /></p> <h2 id="metadata">Metadata</h2> <p>There are three types of metadata: file metadata, column (chunk) metadata and page header metadata. All thrift structures are serialized using the TCompactProtocol.</p> -<p><img alt="Metadata diagram" src="https://github.com/Parquet/parquet-format/raw/master/doc/images/FileFormat.gif" /></p> +<p><img alt="Metadata diagram" src="https://github.com/apache/parquet-format/raw/master/doc/images/FileFormat.gif" /></p> <h2 id="types">Types</h2> <p>The types supported by the file format are intended to be as minimal as possible, -with a focus on how the types effect on disk storage. For example, 16-bit ints +with a focus on how the types effect on disk storage. For example, 16-bit ints are not explicitly supported in the storage format since they are covered by -32-bit ints with an efficient encoding. This reduces the complexity of implementing -readers and writers for the format. The types are: - - BOOLEAN: 1 bit boolean - - INT32: 32 bit signed ints - - INT64: 64 bit signed ints - - INT96: 96 bit signed ints - - FLOAT: IEEE 32-bit floating point values - - DOUBLE: IEEE 64-bit floating point values - - BYTE_ARRAY: arbitrarily long byte arrays.</p> +32-bit ints with an efficient encoding. This reduces the complexity of implementing +readers and writers for the format. The types are:</p> + +<ul> +<li><em>BOOLEAN</em>: 1 bit boolean</li> +<li><em>INT32</em>: 32 bit signed ints</li> +<li><em>INT64</em>: 64 bit signed ints</li> +<li><em>INT96</em>: 96 bit signed ints</li> +<li><em>FLOAT</em>: IEEE 32-bit floating point values</li> +<li><em>DOUBLE</em>: IEEE 64-bit floating point values</li> +<li><em>BYTE_ARRAY</em>: arbitrarily long byte arrays.</li> +</ul> <h3 id="logical-types">Logical Types</h3> @@ -239,7 +245,7 @@ example, strings are stored as byte arra These annotations define how to further decode and interpret the data. Annotations are stored as a <code>ConvertedType</code> in the file metadata and are documented in -<a href="https://github.com/Parquet/parquet-format/blob/master/LogicalTypes.md">LogicalTypes.md</a>.</p> +<a href="https://github.com/apache/parquet-format/blob/master/LogicalTypes.md">LogicalTypes.md</a>.</p> <h2 id="nested-encoding">Nested Encoding</h2> @@ -263,11 +269,14 @@ nothing else. </p> <h2 id="data-pages">Data Pages</h2> <p>For data pages, the 3 pieces of information are encoded back to back, after the page -header. We have the - - definition levels data,<br> - - repetition levels data, - - encoded values. -The size of specified in the header is for all 3 pieces combined.</p> +header. We have the </p> + +<ul> +<li>definition levels data,<br></li> +<li>repetition levels data, </li> +<li>encoded values. +The size of specified in the header is for all 3 pieces combined.</li> +</ul> <p>The data for the data page is always required. The definition and repetition levels are optional, based on the schema definition. If the column is not nested (i.e. @@ -278,7 +287,7 @@ skipped (if encoded, it will always have <p>For example, in the case where the column is non-nested and required, the data in the page is only the encoded values.</p> -<p>The supported encodings are described in <a href="https://github.com/Parquet/parquet-format/blob/master/Encodings.md">Encodings.md</a></p> +<p>The supported encodings are described in <a href="https://github.com/apache/parquet-format/blob/master/Encodings.md">Encodings.md</a></p> <h2 id="column-chunks">Column chunks</h2> @@ -310,7 +319,7 @@ a reader could recover partially written <h2 id="separating-metadata-and-column-data.">Separating metadata and column data.</h2> -<p>The format is explicitly designed to separate the metadata from the data. This +<p>The format is explicitly designed to separate the metadata from the data. This allows splitting columns into multiple files, as well as having a single metadata file reference multiple parquet files. </p> @@ -333,10 +342,13 @@ at a time; this is not the IO chunk. We <h2 id="extensibility">Extensibility</h2> -<p>There are many places in the format for compatible extensions: -- File Version: The file metadata contains a version. -- Encodings: Encodings are specified by enum and more can be added in the future.<br> -- Page types: Additional page types can be added and safely skipped.</p> +<p>There are many places in the format for compatible extensions:</p> + +<ul> +<li>File Version: The file metadata contains a version.</li> +<li>Encodings: Encodings are specified by enum and more can be added in the future.<br></li> +<li>Page types: Additional page types can be added and safely skipped.</li> +</ul> </div> <div class="container"> @@ -345,7 +357,7 @@ at a time; this is not the IO chunk. We <div class="row-fluid"> <div class="span12 text-left"> <div class="span12"> - Copyright 2014 <a href="http://www.apache.org/">Apache Software Foundation</a>. Licensed under the <a href="http://www.apache.org/licenses/">Apache License v2.0</a>. Apache Parquet and the Apache feather logo are trademarks of The Apache Software Foundation. + Copyright 2018 <a href="http://www.apache.org/">Apache Software Foundation</a>. Licensed under the <a href="http://www.apache.org/licenses/">Apache License v2.0</a>. Apache Parquet and the Apache feather logo are trademarks of The Apache Software Foundation. </div> </div> Modified: parquet/site/publish/downloads/index.html URL: http://svn.apache.org/viewvc/parquet/site/publish/downloads/index.html?rev=1832532&r1=1832531&r2=1832532&view=diff ============================================================================== --- parquet/site/publish/downloads/index.html (original) +++ parquet/site/publish/downloads/index.html Wed May 30 11:01:59 2018 @@ -51,12 +51,11 @@ <a href="/community">Get Involved <span class="caret"></span></a> <ul class="dropdown-menu" role="menu" aria-labelledby="drop1"> <li role="presentation"><a role="menuitem" tabindex="-1" href="http://issues.apache.org/jira/browse/parquet"><i class="fa fa-bug"></i> JIRA (Bugs)</a></li> - <li role="presentation"><a role="menuitem" tabindex="-1" href="https://git-wip-us.apache.org/repos/asf?s=parquet"><i class="fa fa-code"></i> Source (Apache)</a></li> + <li role="presentation"><a role="menuitem" tabindex="-1" href="https://gitbox.apache.org/repos/asf?p=parquet-mr.git"><i class="fa fa-code"></i> Source (Apache)</a></li> <li role="presentation"><a role="menuitem" tabindex="-1" href="https://github.com/apache/parquet-mr"><i class="fa fa-github-alt"></i> Source (GitHub)</a></li> <li role="presentation"><a role="menuitem" tabindex="-1" href="/contribute"><i class="fa fa-code-fork"></i> Contributing</a></li> <li role="presentation"><a role="menuitem" tabindex="-1" href="https://twitter.com/ApacheParquet"><i class="fa fa-twitter"></i> @ApacheParquet</a></li> <li role="presentation"><a role="menuitem" tabindex="-1" href="http://stackoverflow.com/questions/tagged/parquet"><i class="fa fa-stack-overflow"></i> StackOverflow</a></li> - <li role="presentation"><a role="menuitem" tabindex="-1" href="https://plus.google.com/events/c36apc97f7invko9p128hq9e6b8"><i class="fa fa-google"></i> Google Hangout</a></li> </ul> </li> </ul> @@ -146,7 +145,7 @@ <div class="row-fluid"> <div class="span12 text-left"> <div class="span12"> - Copyright 2014 <a href="http://www.apache.org/">Apache Software Foundation</a>. Licensed under the <a href="http://www.apache.org/licenses/">Apache License v2.0</a>. Apache Parquet and the Apache feather logo are trademarks of The Apache Software Foundation. + Copyright 2018 <a href="http://www.apache.org/">Apache Software Foundation</a>. Licensed under the <a href="http://www.apache.org/licenses/">Apache License v2.0</a>. Apache Parquet and the Apache feather logo are trademarks of The Apache Software Foundation. </div> </div> Modified: parquet/site/publish/index.html URL: http://svn.apache.org/viewvc/parquet/site/publish/index.html?rev=1832532&r1=1832531&r2=1832532&view=diff ============================================================================== --- parquet/site/publish/index.html (original) +++ parquet/site/publish/index.html Wed May 30 11:01:59 2018 @@ -51,12 +51,11 @@ <a href="/community">Get Involved <span class="caret"></span></a> <ul class="dropdown-menu" role="menu" aria-labelledby="drop1"> <li role="presentation"><a role="menuitem" tabindex="-1" href="http://issues.apache.org/jira/browse/parquet"><i class="fa fa-bug"></i> JIRA (Bugs)</a></li> - <li role="presentation"><a role="menuitem" tabindex="-1" href="https://git-wip-us.apache.org/repos/asf?s=parquet"><i class="fa fa-code"></i> Source (Apache)</a></li> + <li role="presentation"><a role="menuitem" tabindex="-1" href="https://gitbox.apache.org/repos/asf?p=parquet-mr.git"><i class="fa fa-code"></i> Source (Apache)</a></li> <li role="presentation"><a role="menuitem" tabindex="-1" href="https://github.com/apache/parquet-mr"><i class="fa fa-github-alt"></i> Source (GitHub)</a></li> <li role="presentation"><a role="menuitem" tabindex="-1" href="/contribute"><i class="fa fa-code-fork"></i> Contributing</a></li> <li role="presentation"><a role="menuitem" tabindex="-1" href="https://twitter.com/ApacheParquet"><i class="fa fa-twitter"></i> @ApacheParquet</a></li> <li role="presentation"><a role="menuitem" tabindex="-1" href="http://stackoverflow.com/questions/tagged/parquet"><i class="fa fa-stack-overflow"></i> StackOverflow</a></li> - <li role="presentation"><a role="menuitem" tabindex="-1" href="https://plus.google.com/events/c36apc97f7invko9p128hq9e6b8"><i class="fa fa-google"></i> Google Hangout</a></li> </ul> </li> </ul> @@ -148,7 +147,7 @@ <div class="row-fluid"> <div class="span12 text-left"> <div class="span12"> - Copyright 2014 <a href="http://www.apache.org/">Apache Software Foundation</a>. Licensed under the <a href="http://www.apache.org/licenses/">Apache License v2.0</a>. Apache Parquet and the Apache feather logo are trademarks of The Apache Software Foundation. + Copyright 2018 <a href="http://www.apache.org/">Apache Software Foundation</a>. Licensed under the <a href="http://www.apache.org/licenses/">Apache License v2.0</a>. Apache Parquet and the Apache feather logo are trademarks of The Apache Software Foundation. </div> </div> Modified: parquet/site/publish/presentations/index.html URL: http://svn.apache.org/viewvc/parquet/site/publish/presentations/index.html?rev=1832532&r1=1832531&r2=1832532&view=diff ============================================================================== --- parquet/site/publish/presentations/index.html (original) +++ parquet/site/publish/presentations/index.html Wed May 30 11:01:59 2018 @@ -51,12 +51,11 @@ <a href="/community">Get Involved <span class="caret"></span></a> <ul class="dropdown-menu" role="menu" aria-labelledby="drop1"> <li role="presentation"><a role="menuitem" tabindex="-1" href="http://issues.apache.org/jira/browse/parquet"><i class="fa fa-bug"></i> JIRA (Bugs)</a></li> - <li role="presentation"><a role="menuitem" tabindex="-1" href="https://git-wip-us.apache.org/repos/asf?s=parquet"><i class="fa fa-code"></i> Source (Apache)</a></li> + <li role="presentation"><a role="menuitem" tabindex="-1" href="https://gitbox.apache.org/repos/asf?p=parquet-mr.git"><i class="fa fa-code"></i> Source (Apache)</a></li> <li role="presentation"><a role="menuitem" tabindex="-1" href="https://github.com/apache/parquet-mr"><i class="fa fa-github-alt"></i> Source (GitHub)</a></li> <li role="presentation"><a role="menuitem" tabindex="-1" href="/contribute"><i class="fa fa-code-fork"></i> Contributing</a></li> <li role="presentation"><a role="menuitem" tabindex="-1" href="https://twitter.com/ApacheParquet"><i class="fa fa-twitter"></i> @ApacheParquet</a></li> <li role="presentation"><a role="menuitem" tabindex="-1" href="http://stackoverflow.com/questions/tagged/parquet"><i class="fa fa-stack-overflow"></i> StackOverflow</a></li> - <li role="presentation"><a role="menuitem" tabindex="-1" href="https://plus.google.com/events/c36apc97f7invko9p128hq9e6b8"><i class="fa fa-google"></i> Google Hangout</a></li> </ul> </li> </ul> @@ -144,7 +143,7 @@ <div class="row-fluid"> <div class="span12 text-left"> <div class="span12"> - Copyright 2014 <a href="http://www.apache.org/">Apache Software Foundation</a>. Licensed under the <a href="http://www.apache.org/licenses/">Apache License v2.0</a>. Apache Parquet and the Apache feather logo are trademarks of The Apache Software Foundation. + Copyright 2018 <a href="http://www.apache.org/">Apache Software Foundation</a>. Licensed under the <a href="http://www.apache.org/licenses/">Apache License v2.0</a>. Apache Parquet and the Apache feather logo are trademarks of The Apache Software Foundation. </div> </div> Modified: parquet/site/source/_footer.md.erb URL: http://svn.apache.org/viewvc/parquet/site/source/_footer.md.erb?rev=1832532&r1=1832531&r2=1832532&view=diff ============================================================================== --- parquet/site/source/_footer.md.erb (original) +++ parquet/site/source/_footer.md.erb Wed May 30 11:01:59 2018 @@ -4,7 +4,7 @@ <div class="row-fluid"> <div class="span12 text-left"> <div class="span12"> - Copyright 2014 <a href="http://www.apache.org/">Apache Software Foundation</a>. Licensed under the <a href="http://www.apache.org/licenses/">Apache License v2.0</a>. Apache Parquet and the Apache feather logo are trademarks of The Apache Software Foundation. + Copyright 2018 <a href="http://www.apache.org/">Apache Software Foundation</a>. Licensed under the <a href="http://www.apache.org/licenses/">Apache License v2.0</a>. Apache Parquet and the Apache feather logo are trademarks of The Apache Software Foundation. </div> </div> Modified: parquet/site/source/_header.md.erb URL: http://svn.apache.org/viewvc/parquet/site/source/_header.md.erb?rev=1832532&r1=1832531&r2=1832532&view=diff ============================================================================== --- parquet/site/source/_header.md.erb (original) +++ parquet/site/source/_header.md.erb Wed May 30 11:01:59 2018 @@ -21,12 +21,11 @@ <a href="/community">Get Involved <span class="caret"></span></a> <ul class="dropdown-menu" role="menu" aria-labelledby="drop1"> <li role="presentation"><a role="menuitem" tabindex="-1" href="http://issues.apache.org/jira/browse/parquet"><i class="fa fa-bug"></i> JIRA (Bugs)</a></li> - <li role="presentation"><a role="menuitem" tabindex="-1" href="https://git-wip-us.apache.org/repos/asf?s=parquet"><i class="fa fa-code"></i> Source (Apache)</a></li> + <li role="presentation"><a role="menuitem" tabindex="-1" href="https://gitbox.apache.org/repos/asf?p=parquet-mr.git"><i class="fa fa-code"></i> Source (Apache)</a></li> <li role="presentation"><a role="menuitem" tabindex="-1" href="https://github.com/apache/parquet-mr"><i class="fa fa-github-alt"></i> Source (GitHub)</a></li> <li role="presentation"><a role="menuitem" tabindex="-1" href="/contribute"><i class="fa fa-code-fork"></i> Contributing</a></li> <li role="presentation"><a role="menuitem" tabindex="-1" href="https://twitter.com/ApacheParquet"><i class="fa fa-twitter"></i> @ApacheParquet</a></li> <li role="presentation"><a role="menuitem" tabindex="-1" href="http://stackoverflow.com/questions/tagged/parquet"><i class="fa fa-stack-overflow"></i> StackOverflow</a></li> - <li role="presentation"><a role="menuitem" tabindex="-1" href="https://plus.google.com/events/c36apc97f7invko9p128hq9e6b8"><i class="fa fa-google"></i> Google Hangout</a></li> </ul> </li> </ul> Modified: parquet/site/source/community.html.md URL: http://svn.apache.org/viewvc/parquet/site/source/community.html.md?rev=1832532&r1=1832531&r2=1832532&view=diff ============================================================================== --- parquet/site/source/community.html.md (original) +++ parquet/site/source/community.html.md Wed May 30 11:01:59 2018 @@ -17,7 +17,7 @@ <h4 name="contribute">Contribute a core patch</h4> <p>Follow our <a href="/contribute/">contribution guidelines</a> when submitting a patch.</p> <h3>Stay in Touch</h3> - <p>Attend a Parquet <a href="https://plus.google.com/events/c36apc97f7invko9p128hq9e6b8">sync up meeting</a> via Google Hangouts.</p> + <p>Attend a Parquet sync up meeting via Google Hangouts. These meetings are hold ad-hoc, announced on the developer list about a week before the meeting.</p> <h3>Bylaws</h3> <p><a href="/bylaws/">Parquet community bylaws</a></p> Modified: parquet/site/source/documentation/latest.html.md URL: http://svn.apache.org/viewvc/parquet/site/source/documentation/latest.html.md?rev=1832532&r1=1832531&r2=1832532&view=diff ============================================================================== --- parquet/site/source/documentation/latest.html.md (original) +++ parquet/site/source/documentation/latest.html.md Wed May 30 11:01:59 2018 @@ -2,7 +2,7 @@ We created Parquet to make the advantages of compressed, efficient columnar data representation available to any project in the Hadoop ecosystem. -Parquet is built from the ground up with complex nested data structures in mind, and uses the [record shredding and assembly algorithm](https://github.com/Parquet/parquet-mr/wiki/The-striping-and-assembly-algorithms-from-the-Dremel-paper) described in the Dremel paper. We believe this approach is superior to simple flattening of nested name spaces. +Parquet is built from the ground up with complex nested data structures in mind, and uses the [record shredding and assembly algorithm](https://github.com/julienledem/redelm/wiki/The-striping-and-assembly-algorithms-from-the-Dremel-paper) described in the Dremel paper. We believe this approach is superior to simple flattening of nested name spaces. Parquet is built to support very efficient compression and encoding schemes. Multiple projects have demonstrated the performance impact of applying the right compression and encoding scheme to the data. Parquet allows compression schemes to be specified on a per-column level, and is future-proofed to allow adding more encodings as they are invented and implemented. @@ -10,11 +10,15 @@ Parquet is built to be used by anyone. T ## Modules -The `parquet-format` project contains format specifications and Thrift definitions of metadata required to properly read Parquet files. +The [parquet-format](https://github.com/apache/parquet-format) project contains format specifications and Thrift definitions of metadata required to properly read Parquet files. -The `parquet-mr` project contains multiple sub-modules, which implement the core components of reading and writing a nested, column-oriented data stream, map this core onto the parquet format, and provide Hadoop Input/Output Formats, Pig loaders, and other java-based utilities for interacting with Parquet. +The [parquet-mr](https://github.com/apache/parquet-mr) project contains multiple sub-modules, which implement the core components of reading and writing a nested, column-oriented data stream, map this core onto the parquet format, and provide Hadoop Input/Output Formats, Pig loaders, and other Java-based utilities for interacting with Parquet. -The `parquet-compatibility` project contains compatibility tests that can be used to verify that implementations in different languages can read and write each other's files. +The [parquet-cpp](https://github.com/apache/parquet-cpp) project is a C++ library to read-write Parquet files. + +The [parquet-rs](https://github.com/sunchao/parquet-rs) project is a Rust library to read-write Parquet files. + +The [parquet-compatibility](https://github.com/Parquet/parquet-compatibility) project contains compatibility tests that can be used to verify that implementations in different languages can read and write each other's files. ## Building @@ -89,27 +93,28 @@ Metadata is written after the data to al Readers are expected to first read the file metadata to find all the column chunks they are interested in. The columns chunks should then be read sequentially. -  +  ## Metadata There are three types of metadata: file metadata, column (chunk) metadata and page header metadata. All thrift structures are serialized using the TCompactProtocol. -  +  ## Types The types supported by the file format are intended to be as minimal as possible, -with a focus on how the types effect on disk storage. For example, 16-bit ints +with a focus on how the types effect on disk storage. For example, 16-bit ints are not explicitly supported in the storage format since they are covered by -32-bit ints with an efficient encoding. This reduces the complexity of implementing -readers and writers for the format. The types are: - - BOOLEAN: 1 bit boolean - - INT32: 32 bit signed ints - - INT64: 64 bit signed ints - - INT96: 96 bit signed ints - - FLOAT: IEEE 32-bit floating point values - - DOUBLE: IEEE 64-bit floating point values - - BYTE_ARRAY: arbitrarily long byte arrays. +32-bit ints with an efficient encoding. This reduces the complexity of implementing +readers and writers for the format. The types are: + + - *BOOLEAN*: 1 bit boolean + - *INT32*: 32 bit signed ints + - *INT64*: 64 bit signed ints + - *INT96*: 96 bit signed ints + - *FLOAT*: IEEE 32-bit floating point values + - *DOUBLE*: IEEE 64-bit floating point values + - *BYTE_ARRAY*: arbitrarily long byte arrays. ### Logical Types Logical types are used to extend the types that parquet can be used to store, @@ -121,7 +126,7 @@ Annotations are stored as a `ConvertedTy documented in [LogicalTypes.md][logical-types]. -[logical-types]: https://github.com/Parquet/parquet-format/blob/master/LogicalTypes.md +[logical-types]: https://github.com/apache/parquet-format/blob/master/LogicalTypes.md ## Nested Encoding To encode nested columns, Parquet uses the Dremel encoding with definition and @@ -142,7 +147,8 @@ nothing else. ## Data Pages For data pages, the 3 pieces of information are encoded back to back, after the page -header. We have the +header. We have the + - definition levels data, - repetition levels data, - encoded values. @@ -157,7 +163,7 @@ skipped (if encoded, it will always have For example, in the case where the column is non-nested and required, the data in the page is only the encoded values. -The supported encodings are described in [Encodings.md](https://github.com/Parquet/parquet-format/blob/master/Encodings.md) +The supported encodings are described in [Encodings.md](https://github.com/apache/parquet-format/blob/master/Encodings.md) ## Column chunks Column chunks are composed of pages written back to back. The pages share a common @@ -185,7 +191,7 @@ far. Combining this with the strategy u a reader could recover partially written files. ## Separating metadata and column data. -The format is explicitly designed to separate the metadata from the data. This +The format is explicitly designed to separate the metadata from the data. This allows splitting columns into multiple files, as well as having a single metadata file reference multiple parquet files. @@ -205,6 +211,7 @@ at a time; this is not the IO chunk. We ## Extensibility There are many places in the format for compatible extensions: + - File Version: The file metadata contains a version. - Encodings: Encodings are specified by enum and more can be added in the future. - Page types: Additional page types can be added and safely skipped.
