Repository: arrow-site Updated Branches: refs/heads/asf-site 4c7c2f1b2 -> 301d577e0
Update docs Project: http://git-wip-us.apache.org/repos/asf/arrow-site/repo Commit: http://git-wip-us.apache.org/repos/asf/arrow-site/commit/301d577e Tree: http://git-wip-us.apache.org/repos/asf/arrow-site/tree/301d577e Diff: http://git-wip-us.apache.org/repos/asf/arrow-site/diff/301d577e Branch: refs/heads/asf-site Commit: 301d577e09df11f95edda377da499215a7f15ec7 Parents: 4c7c2f1 Author: Antoine Pitrou <anto...@python.org> Authored: Mon Apr 9 10:41:14 2018 +0200 Committer: Antoine Pitrou <anto...@python.org> Committed: Mon Apr 9 10:41:14 2018 +0200 ---------------------------------------------------------------------- blog/index.html | 2 +- committers/index.html | 6 ++++++ docs/ipc.html | 47 ++++++++++++++++++++++++++++++++++++++++++-- docs/memory_layout.html | 11 +++++------ docs/metadata.html | 3 ++- feed.xml | 4 ++-- 6 files changed, 61 insertions(+), 12 deletions(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/arrow-site/blob/301d577e/blog/index.html ---------------------------------------------------------------------- diff --git a/blog/index.html b/blog/index.html index cfddf8f..ddad423 100644 --- a/blog/index.html +++ b/blog/index.html @@ -489,7 +489,7 @@ implementations and bindings to more languages.</p> <div class="container"> <h2> Improvements to Java Vector API in Apache Arrow 0.8.0 - <a href="/blog/2017/12/19/java-vector-improvements/" class="permalink" title="Permalink">â</a> + <a href="/blog/2017/12/18/java-vector-improvements/" class="permalink" title="Permalink">â</a> </h2> http://git-wip-us.apache.org/repos/asf/arrow-site/blob/301d577e/committers/index.html ---------------------------------------------------------------------- diff --git a/committers/index.html b/committers/index.html index 10a508e..9cbf10e 100644 --- a/committers/index.html +++ b/committers/index.html @@ -292,6 +292,12 @@ <td>ptaylor</td> <td>Graphistry</td> </tr> +<tr> +<td>Antoine Pitrou</td> +<td>Committer</td> +<td>apitrou</td> +<td>Independent / Two Sigma</td> +</tr> </tbody></table> </div> <!-- /container --> http://git-wip-us.apache.org/repos/asf/arrow-site/blob/301d577e/docs/ipc.html ---------------------------------------------------------------------- diff --git a/docs/ipc.html b/docs/ipc.html index a825908..5022c80 100644 --- a/docs/ipc.html +++ b/docs/ipc.html @@ -146,7 +146,7 @@ <ul> <li>A length prefix indicating the metadata size</li> - <li>The message metadata as a <a href="https://github.com/google]/flatbuffers">Flatbuffer</a></li> + <li>The message metadata as a <a href="https://github.com/google/flatbuffers">Flatbuffer</a></li> <li>Padding bytes to an 8-byte boundary</li> <li>The message body, which must be a multiple of 8 bytes</li> </ul> @@ -191,7 +191,9 @@ flatbuffer union), and the size of the message body:</p> of encapsulated messages, each of which follows the format above. The schema comes first in the stream, and it is the same for all of the record batches that follow. If any fields in the schema are dictionary-encoded, one or more -<code class="highlighter-rouge">DictionaryBatch</code> messages will follow the schema.</p> +<code class="highlighter-rouge">DictionaryBatch</code> messages will be included. <code class="highlighter-rouge">DictionaryBatch</code> and +<code class="highlighter-rouge">RecordBatch</code> messages may be interleaved, but before any dictionary key is used +in a <code class="highlighter-rouge">RecordBatch</code> it should be defined in a <code class="highlighter-rouge">DictionaryBatch</code>.</p> <div class="highlighter-rouge"><pre class="highlight"><code><SCHEMA> <DICTIONARY 0> @@ -199,6 +201,10 @@ that follow. If any fields in the schema are dictionary-encoded, one or more <DICTIONARY k - 1> <RECORD BATCH 0> ... +<DICTIONARY x DELTA> +... +<DICTIONARY y DELTA> +... <RECORD BATCH n - 1> <EOS [optional]: int32> </code></pre> @@ -233,6 +239,10 @@ footer.</p> </code></pre> </div> +<p>In the file format, there is no requirement that dictionary keys should be +defined in a <code class="highlighter-rouge">DictionaryBatch</code> before they are used in a <code class="highlighter-rouge">RecordBatch</code>, as long +as the keys are defined somewhere in the file.</p> + <h3 id="recordbatch-body-structure">RecordBatch body structure</h3> <p>The <code class="highlighter-rouge">RecordBatch</code> metadata contains a depth-first (pre-order) flattened set of @@ -306,6 +316,7 @@ the dictionaries can be properly interpreted.</p> <div class="highlighter-rouge"><pre class="highlight"><code>table DictionaryBatch { id: long; data: RecordBatch; + isDelta: boolean = false; } </code></pre> </div> @@ -315,6 +326,38 @@ in the schema, so that dictionaries can even be used for multiple fields. See the <a href="https://github.com/apache/arrow/blob/master/format/Layout.md">Physical Layout</a> document for more about the semantics of dictionary-encoded data.</p> +<p>The dictionary <code class="highlighter-rouge">isDelta</code> flag allows dictionary batches to be modified +mid-stream. A dictionary batch with <code class="highlighter-rouge">isDelta</code> set indicates that its vector +should be concatenated with those of any previous batches with the same <code class="highlighter-rouge">id</code>. A +stream which encodes one column, the list of strings +<code class="highlighter-rouge">["A", "B", "C", "B", "D", "C", "E", "A"]</code>, with a delta dictionary batch could +take the form:</p> + +<div class="highlighter-rouge"><pre class="highlight"><code><SCHEMA> +<DICTIONARY 0> +(0) "A" +(1) "B" +(2) "C" + +<RECORD BATCH 0> +0 +1 +2 +1 + +<DICTIONARY 0 DELTA> +(3) "D" +(4) "E" + +<RECORD BATCH 1> +3 +2 +4 +0 +EOS +</code></pre> +</div> + <h3 id="tensor-multi-dimensional-array-message-format">Tensor (Multi-dimensional Array) Message Format</h3> <p>The <code class="highlighter-rouge">Tensor</code> message types provides a way to write a multidimensional array of http://git-wip-us.apache.org/repos/asf/arrow-site/blob/301d577e/docs/memory_layout.html ---------------------------------------------------------------------- diff --git a/docs/memory_layout.html b/docs/memory_layout.html index 10fc82c..ff8f9e8 100644 --- a/docs/memory_layout.html +++ b/docs/memory_layout.html @@ -162,9 +162,8 @@ from <code class="highlighter-rouge">List<V></code> iff U and V are differ or a fully-specified nested type. When we say slot we mean a relative type value, not necessarily any physical storage region.</li> <li>Logical type: A data type that is implemented using some relative (physical) -type. For example, a Decimal value stored in 16 bytes could be stored in a -primitive array with slot size 16 bytes. Similarly, strings can be stored as -<code class="highlighter-rouge">List<1-byte></code>.</li> +type. For example, Decimal values are stored as 16 bytes in a fixed byte +size array. Similarly, strings can be stored as <code class="highlighter-rouge">List<1-byte></code>.</li> <li>Parent and child arrays: names to express relationships between physical value arrays in a nested type structure. For example, a <code class="highlighter-rouge">List<T></code>-type parent array has a T-type array as its child (see more on lists below).</li> @@ -753,9 +752,9 @@ the the types array indicates that a slot contains a different type at the index <h2 id="dictionary-encoding">Dictionary encoding</h2> <p>When a field is dictionary encoded, the values are represented by an array of Int32 representing the index of the value in the dictionary. -The Dictionary is received as a DictionaryBatch whose id is referenced by a dictionary attribute defined in the metadata (<a href="https://github.com/apache/arrow/blob/master/format/Message.fbs">Message.fbs</a>) in the Field table. -The dictionary has the same layout as the type of the field would dictate. Each entry in the dictionary can be accessed by its index in the DictionaryBatch. -When a Schema references a Dictionary id, it must send a DictionaryBatch for this id before any RecordBatch.</p> +The Dictionary is received as one or more DictionaryBatches with the id referenced by a dictionary attribute defined in the metadata (<a href="https://github.com/apache/arrow/blob/master/format/Message.fbs">Message.fbs</a>) in the Field table. +The dictionary has the same layout as the type of the field would dictate. Each entry in the dictionary can be accessed by its index in the DictionaryBatches. +When a Schema references a Dictionary id, it must send at least one DictionaryBatch for this id.</p> <p>As an example, you could have the following data:</p> <div class="highlighter-rouge"><pre class="highlight"><code>type: List<String> http://git-wip-us.apache.org/repos/asf/arrow-site/blob/301d577e/docs/metadata.html ---------------------------------------------------------------------- diff --git a/docs/metadata.html b/docs/metadata.html index df36202..858f0c0 100644 --- a/docs/metadata.html +++ b/docs/metadata.html @@ -531,7 +531,8 @@ logical type, which have no children) and 3 buffers:</p> <h3 id="decimal">Decimal</h3> -<p>TBD</p> +<p>Decimals are represented as a 2âs complement 128-bit (16 byte) signed integer +in little-endian byte order.</p> <h3 id="timestamp">Timestamp</h3> http://git-wip-us.apache.org/repos/asf/arrow-site/blob/301d577e/feed.xml ---------------------------------------------------------------------- diff --git a/feed.xml b/feed.xml index e9e13a6..98d48d5 100644 --- a/feed.xml +++ b/feed.xml @@ -1,4 +1,4 @@ -<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="3.4.3">Jekyll</generator><link href="/feed.xml" rel="self" type="application/atom+xml" /><link href="/" rel="alternate" type="text/html" /><updated>2018-03-22T09:11:11-04:00</updated><id>/</id><entry><title type="html">A Native Go Library for Apache Arrow</title><link href="/blog/2018/03/22/go-code-donation/" rel="alternate" type="text/html" title="A Native Go Library for Apache Arrow" /><published>2018-03-22T00:00:00-04:00</published><updated>2018-03-22T00:00:00-04:00</updated><id>/blog/2018/03/22/go-code-donation</id><content type="html" xml:base="/blog/2018/03/22/go-code-donation/"><!-- +<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="3.4.3">Jekyll</generator><link href="/feed.xml" rel="self" type="application/atom+xml" /><link href="/" rel="alternate" type="text/html" /><updated>2018-04-09T04:33:24-04:00</updated><id>/</id><entry><title type="html">A Native Go Library for Apache Arrow</title><link href="/blog/2018/03/22/go-code-donation/" rel="alternate" type="text/html" title="A Native Go Library for Apache Arrow" /><published>2018-03-22T00:00:00-04:00</published><updated>2018-03-22T00:00:00-04:00</updated><id>/blog/2018/03/22/go-code-donation</id><content type="html" xml:base="/blog/2018/03/22/go-code-donation/"><!-- --> @@ -266,7 +266,7 @@ working to improve and expand the libraries in support of downstream use cases.& <p>We continue to look for more JavaScript, Julia, R, Rust, and other programming language developers to join the project and expand the available -implementations and bindings to more languages.</p></content><author><name>wesm</name></author></entry><entry><title type="html">Improvements to Java Vector API in Apache Arrow 0.8.0</title><link href="/blog/2017/12/19/java-vector-improvements/" rel="alternate" type="text/html" title="Improvements to Java Vector API in Apache Arrow 0.8.0" /><published>2017-12-18T19:00:00-05:00</published><updated>2017-12-18T19:00:00-05:00</updated><id>/blog/2017/12/19/java-vector-improvements</id><content type="html" xml:base="/blog/2017/12/19/java-vector-improvements/"><!-- +implementations and bindings to more languages.</p></content><author><name>wesm</name></author></entry><entry><title type="html">Improvements to Java Vector API in Apache Arrow 0.8.0</title><link href="/blog/2017/12/18/java-vector-improvements/" rel="alternate" type="text/html" title="Improvements to Java Vector API in Apache Arrow 0.8.0" /><published>2017-12-18T19:00:00-05:00</published><updated>2017-12-18T19:00:00-05:00</updated><id>/blog/2017/12/18/java-vector-improvements</id><content type="html" xml:base="/blog/2017/12/18/java-vector-improvements/"><!-- -->