Repository: avro Updated Branches: refs/heads/master d7e123148 -> 30408a9c1
AVRO-1704: Add single-record encoding spec. (Contributed by Niels Basjes) Project: http://git-wip-us.apache.org/repos/asf/avro/repo Commit: http://git-wip-us.apache.org/repos/asf/avro/commit/30408a9c Tree: http://git-wip-us.apache.org/repos/asf/avro/tree/30408a9c Diff: http://git-wip-us.apache.org/repos/asf/avro/diff/30408a9c Branch: refs/heads/master Commit: 30408a9c192c5f4eaaf42f01f0ffbfffd705aa57 Parents: d7e1231 Author: Ryan Blue <[email protected]> Authored: Sun Jul 24 15:47:36 2016 -0700 Committer: Ryan Blue <[email protected]> Committed: Sun Sep 4 13:42:34 2016 -0700 ---------------------------------------------------------------------- CHANGES.txt | 2 ++ doc/src/content/xdocs/spec.xml | 36 ++++++++++++++++++++++++++++++++---- 2 files changed, 34 insertions(+), 4 deletions(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/avro/blob/30408a9c/CHANGES.txt ---------------------------------------------------------------------- diff --git a/CHANGES.txt b/CHANGES.txt index 19f921b..3e329aa 100644 --- a/CHANGES.txt +++ b/CHANGES.txt @@ -8,6 +8,8 @@ Trunk (not yet released) AVRO-1704: Java: Add support for single-message encoding. (blue) + AVRO-1704: Spec: Add single-message encoding format. (Niels Basjes via blue) + OPTIMIZATIONS IMPROVEMENTS http://git-wip-us.apache.org/repos/asf/avro/blob/30408a9c/doc/src/content/xdocs/spec.xml ---------------------------------------------------------------------- diff --git a/doc/src/content/xdocs/spec.xml b/doc/src/content/xdocs/spec.xml index ec1f199..917d314 100644 --- a/doc/src/content/xdocs/spec.xml +++ b/doc/src/content/xdocs/spec.xml @@ -487,18 +487,18 @@ value, followed by that many key/value pairs. A block with count zero indicates the end of the map. Each item is encoded per the map's value schema.</p> - + <p>If a block's count is negative, its absolute value is used, and the count is followed immediately by a <code>long</code> block <em>size</em> indicating the number of bytes in the block. This block size permits fast skipping through data, e.g., when projecting a record to a subset of its fields.</p> - + <p>The blocked representation permits one to read and write maps larger than can be buffered in memory, since one can start writing items without knowing the full length of the map.</p> - + </section> <section id="union_encoding"> @@ -569,6 +569,34 @@ </section> + <section id="single_object_encoding"> + <title>Single-object encoding</title> + + <p>In some situations a single Avro serialized object is to be stored for a + longer period of time. One very common example is storing Avro records + for several weeks in an <a href="http://kafka.apache.org/">Apache Kafka</a> topic.</p> + <p>In the period after a schema change this persistance system will contain records + that have been written with different schemas. So the need arises to know which schema + was used to write a record to support schema evolution correctly. + In most cases the schema itself is too large to include in the message, + so this binary wrapper format supports the use case more effectively.</p> + + <section id="single_object_encoding_spec"> + <title>Single object encoding specification</title> + <p>Single Avro objects are encoded as follows:</p> + <ol> + <li>A two-byte marker, <code>C3 01</code>, to show that the message is Avro and uses this single-record format (version 1).</li> + <li>The 8-byte little-endian CRC-64-AVRO <a href="#schema_fingerprints">fingerprint</a> of the object's schema</li> + <li>The Avro object encoded using <a href="#binary_encoding">Avro's binary encoding</a></li> + </ol> + </section> + + <p>Implementations use the 2-byte marker to determine whether a payload is Avro. + This check helps avoid expensive lookups that resolve the schema from a + fingerprint, when the message is not an encoded Avro payload.</p> + + </section> + </section> <section id="order"> @@ -1237,7 +1265,7 @@ </ul> </section> - <section> + <section id="schema_fingerprints"> <title>Schema Fingerprints</title> <p>"[A] fingerprinting algorithm is a procedure that maps an
