This is an automated email from the ASF dual-hosted git repository.
vinoth pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/hudi.git
The following commit(s) were added to refs/heads/asf-site by this push:
new 09606d3 Travis CI build asf-site
09606d3 is described below
commit 09606d31a5252cee3bb05c1a201482feed810c06
Author: CI <[email protected]>
AuthorDate: Thu Jul 22 04:11:24 2021 +0000
Travis CI build asf-site
---
content/docs/writing_data.html | 258 +++++++++++++++++++++++++++++++++++++++++
1 file changed, 258 insertions(+)
diff --git a/content/docs/writing_data.html b/content/docs/writing_data.html
index 719cc0d..e1f90e4 100644
--- a/content/docs/writing_data.html
+++ b/content/docs/writing_data.html
@@ -367,6 +367,7 @@
<li><a href="#syncing-to-hive">Syncing to Hive</a></li>
<li><a href="#deletes">Deletes</a></li>
<li><a href="#optimized-dfs-access">Optimized DFS Access</a></li>
+ <li><a href="#schema-evolution">Schema Evolution</a></li>
</ul>
</nav>
</aside>
@@ -876,6 +877,263 @@ once created cannot be deleted, but simply expanded as
explained before.</li>
<li>For workloads with heavy updates, the <a
href="/docs/concepts.html#merge-on-read-table">merge-on-read table</a> provides
a nice mechanism for ingesting quickly into smaller files and then later
merging them into larger base files via compaction.</li>
</ul>
+<h2 id="schema-evolution">Schema Evolution</h2>
+
+<p>Schema evolution is a very important aspect of data management.
+Hudi supports common schema evolution scenarios, such as adding a nullable
field or promoting a datatype of a field, out-of-the-box.
+Furthermore, the evolved schema is queryable across engines, such as Presto,
Hive and Spark SQL.
+The following table presents a summary of the types of schema changes
compatible with different Hudi table types.</p>
+
+<table>
+ <thead>
+ <tr>
+ <th>Schema Change</th>
+ <th>COW</th>
+ <th>MOR</th>
+ <th>Remarks</th>
+ </tr>
+ </thead>
+ <tbody>
+ <tr>
+ <td>Add a new nullable column at root level at the end</td>
+ <td>Yes</td>
+ <td>Yes</td>
+ <td><code class="highlighter-rouge">Yes</code> means that a write with
evolved schema succeeds and a read following the write succeeds to read entire
dataset.</td>
+ </tr>
+ <tr>
+ <td>Add a new nullable column to inner struct (at the end)</td>
+ <td>Yes</td>
+ <td>Yes</td>
+ <td> </td>
+ </tr>
+ <tr>
+ <td>Add a new complex type field with default (map and array)</td>
+ <td>Yes</td>
+ <td>Yes</td>
+ <td> </td>
+ </tr>
+ <tr>
+ <td>Add a new nullable column and change the ordering of fields</td>
+ <td>No</td>
+ <td>No</td>
+ <td>Write succeeds but read fails if the write with evolved schema
updated only some of the base files but not all. Currently, Hudi does not
maintain a schema registry with history of changes across base files.
Nevertheless, if the upsert touched all base files then the read will
succeed.</td>
+ </tr>
+ <tr>
+ <td>Add a custom nullable Hudi meta column, e.g. <code
class="highlighter-rouge">_hoodie_meta_col</code></td>
+ <td>Yes</td>
+ <td>Yes</td>
+ <td> </td>
+ </tr>
+ <tr>
+ <td>Promote datatype from <code class="highlighter-rouge">int</code> to
<code class="highlighter-rouge">long</code> for a field at root level</td>
+ <td>Yes</td>
+ <td>Yes</td>
+ <td>For other types, Hudi supports promotion as specified in <a
href="http://avro.apache.org/docs/current/spec.html#Schema+Resolution">Avro
schema resolution</a>.</td>
+ </tr>
+ <tr>
+ <td>Promote datatype from <code class="highlighter-rouge">int</code> to
<code class="highlighter-rouge">long</code> for a nested field</td>
+ <td>Yes</td>
+ <td>Yes</td>
+ <td> </td>
+ </tr>
+ <tr>
+ <td>Promote datatype from <code class="highlighter-rouge">int</code> to
<code class="highlighter-rouge">long</code> for a complex type (value of map or
array)</td>
+ <td>Yes</td>
+ <td>Yes</td>
+ <td> </td>
+ </tr>
+ <tr>
+ <td>Add a new non-nullable column at root level at the end</td>
+ <td>No</td>
+ <td>No</td>
+ <td>In case of MOR table with Spark data source, write succeeds but read
fails. As a <strong>workaround</strong>, you can make the field nullable.</td>
+ </tr>
+ <tr>
+ <td>Add a new non-nullable column to inner struct (at the end)</td>
+ <td>No</td>
+ <td>No</td>
+ <td> </td>
+ </tr>
+ <tr>
+ <td>Change datatype from <code class="highlighter-rouge">long</code> to
<code class="highlighter-rouge">int</code> for a nested field</td>
+ <td>No</td>
+ <td>No</td>
+ <td> </td>
+ </tr>
+ <tr>
+ <td>Change datatype from <code class="highlighter-rouge">long</code> to
<code class="highlighter-rouge">int</code> for a complex type (value of map or
array)</td>
+ <td>No</td>
+ <td>No</td>
+ <td> </td>
+ </tr>
+ </tbody>
+</table>
+
+<p>Let us walk through an example to demonstrate the schema evolution support
in Hudi.
+In the below example, we are going to add a new string field and change the
datatype of a field from int to long.</p>
+
+<div class="language-java highlighter-rouge"><div class="highlight"><pre
class="highlight"><code><span class="nc">Welcome</span> <span
class="n">to</span>
+ <span class="n">____</span> <span class="n">__</span>
+ <span class="o">/</span> <span class="n">__</span><span
class="o">/</span><span class="n">__</span> <span class="n">___</span> <span
class="n">_____</span><span class="o">/</span> <span class="o">/</span><span
class="n">__</span>
+ <span class="n">_</span><span class="err">\</span> <span
class="err">\</span><span class="o">/</span> <span class="n">_</span> <span
class="err">\</span><span class="o">/</span> <span class="n">_</span> <span
class="err">`</span><span class="o">/</span> <span class="n">__</span><span
class="o">/</span> <span class="err">'</span><span class="n">_</span><span
class="o">/</span>
+ <span class="o">/</span><span class="n">___</span><span class="o">/</span>
<span class="o">.</span><span class="na">__</span><span class="o">/</span><span
class="err">\</span><span class="n">_</span><span class="o">,</span><span
class="n">_</span><span class="o">/</span><span class="n">_</span><span
class="o">/</span> <span class="o">/</span><span class="n">_</span><span
class="o">/</span><span class="err">\</span><span class="n">_</span><span
class="err">\</span> <span class="n">v [...]
+ <span class="o">/</span><span class="n">_</span><span class="o">/</span>
+
+ <span class="nc">Using</span> <span class="nc">Scala</span> <span
class="n">version</span> <span class="mf">2.12</span><span
class="o">.</span><span class="mi">10</span> <span class="o">(</span><span
class="nc">OpenJDK</span> <span class="mi">64</span><span
class="o">-</span><span class="nc">Bit</span> <span class="nc">Server</span>
<span class="no">VM</span><span class="o">,</span> <span class="nc">Java</span>
<span class="mf">1.8</span><span class="o">.</span><span class="mi">0_292 [...]
+ <span class="nc">Type</span> <span class="n">in</span> <span
class="n">expressions</span> <span class="n">to</span> <span
class="n">have</span> <span class="n">them</span> <span
class="n">evaluated</span><span class="o">.</span>
+ <span class="nc">Type</span> <span class="o">:</span><span
class="n">help</span> <span class="k">for</span> <span class="n">more</span>
<span class="n">information</span><span class="o">.</span>
+
+<span class="n">scala</span><span class="o">></span> <span
class="kn">import</span> <span
class="nn">org.apache.hudi.QuickstartUtils._</span>
+<span class="kn">import</span> <span
class="nn">org.apache.hudi.QuickstartUtils._</span>
+
+<span class="n">scala</span><span class="o">></span> <span
class="kn">import</span> <span
class="nn">scala.collection.JavaConversions._</span>
+<span class="kn">import</span> <span
class="nn">scala.collection.JavaConversions._</span>
+
+<span class="n">scala</span><span class="o">></span> <span
class="kn">import</span> <span class="nn">org.apache.spark.sql.SaveMode._</span>
+<span class="kn">import</span> <span
class="nn">org.apache.spark.sql.SaveMode._</span>
+
+<span class="n">scala</span><span class="o">></span> <span
class="kn">import</span> <span
class="nn">org.apache.hudi.DataSourceReadOptions._</span>
+<span class="kn">import</span> <span
class="nn">org.apache.hudi.DataSourceReadOptions._</span>
+
+<span class="n">scala</span><span class="o">></span> <span
class="kn">import</span> <span
class="nn">org.apache.hudi.DataSourceWriteOptions._</span>
+<span class="kn">import</span> <span
class="nn">org.apache.hudi.DataSourceWriteOptions._</span>
+
+<span class="n">scala</span><span class="o">></span> <span
class="kn">import</span> <span
class="nn">org.apache.hudi.config.HoodieWriteConfig._</span>
+<span class="kn">import</span> <span
class="nn">org.apache.hudi.config.HoodieWriteConfig._</span>
+
+<span class="n">scala</span><span class="o">></span> <span
class="kn">import</span> <span class="nn">org.apache.spark.sql.types._</span>
+<span class="kn">import</span> <span
class="nn">org.apache.spark.sql.types._</span>
+
+<span class="n">scala</span><span class="o">></span> <span
class="kn">import</span> <span class="nn">org.apache.spark.sql.Row</span>
+<span class="kn">import</span> <span class="nn">org.apache.spark.sql.Row</span>
+
+<span class="n">scala</span><span class="o">></span> <span
class="n">val</span> <span class="n">tableName</span> <span class="o">=</span>
<span class="s">"hudi_trips_cow"</span>
+ <span class="nl">tableName:</span> <span class="nc">String</span> <span
class="o">=</span> <span class="n">hudi_trips_cow</span>
+<span class="n">scala</span><span class="o">></span> <span
class="n">val</span> <span class="n">basePath</span> <span class="o">=</span>
<span class="s">"file:///tmp/hudi_trips_cow"</span>
+ <span class="nl">basePath:</span> <span class="nc">String</span> <span
class="o">=</span> <span class="nl">file:</span><span
class="c1">///tmp/hudi_trips_cow</span>
+<span class="n">scala</span><span class="o">></span> <span
class="n">val</span> <span class="n">schema</span> <span class="o">=</span>
<span class="nc">StructType</span><span class="o">(</span> <span
class="nc">Array</span><span class="o">(</span>
+ <span class="o">|</span> <span class="nc">StructField</span><span
class="o">(</span><span class="s">"rowId"</span><span class="o">,</span> <span
class="nc">StringType</span><span class="o">,</span><span
class="kc">true</span><span class="o">),</span>
+ <span class="o">|</span> <span class="nc">StructField</span><span
class="o">(</span><span class="s">"partitionId"</span><span class="o">,</span>
<span class="nc">StringType</span><span class="o">,</span><span
class="kc">true</span><span class="o">),</span>
+ <span class="o">|</span> <span class="nc">StructField</span><span
class="o">(</span><span class="s">"preComb"</span><span class="o">,</span>
<span class="nc">LongType</span><span class="o">,</span><span
class="kc">true</span><span class="o">),</span>
+ <span class="o">|</span> <span class="nc">StructField</span><span
class="o">(</span><span class="s">"name"</span><span class="o">,</span> <span
class="nc">StringType</span><span class="o">,</span><span
class="kc">true</span><span class="o">),</span>
+ <span class="o">|</span> <span class="nc">StructField</span><span
class="o">(</span><span class="s">"versionId"</span><span class="o">,</span>
<span class="nc">StringType</span><span class="o">,</span><span
class="kc">true</span><span class="o">),</span>
+ <span class="o">|</span> <span class="nc">StructField</span><span
class="o">(</span><span class="s">"intToLong"</span><span class="o">,</span>
<span class="nc">IntegerType</span><span class="o">,</span><span
class="kc">true</span><span class="o">)</span>
+ <span class="o">|</span> <span class="o">))</span>
+ <span class="nl">schema:</span> <span class="n">org</span><span
class="o">.</span><span class="na">apache</span><span class="o">.</span><span
class="na">spark</span><span class="o">.</span><span class="na">sql</span><span
class="o">.</span><span class="na">types</span><span class="o">.</span><span
class="na">StructType</span> <span class="o">=</span> <span
class="nc">StructType</span><span class="o">(</span><span
class="nc">StructField</span><span class="o">(</span><span class="n">ro [...]
+
+<span class="n">scala</span><span class="o">></span> <span
class="n">val</span> <span class="n">data1</span> <span class="o">=</span>
<span class="nc">Seq</span><span class="o">(</span><span
class="nc">Row</span><span class="o">(</span><span
class="s">"row_1"</span><span class="o">,</span> <span
class="s">"part_0"</span><span class="o">,</span> <span
class="mi">0L</span><span class="o">,</span> <span class="s">"bob"</span><span
class="o">,</span> <span class="s">"v_0"</span><span clas [...]
+ <span class="o">|</span> <span class="nc">Row</span><span
class="o">(</span><span class="s">"row_2"</span><span class="o">,</span> <span
class="s">"part_0"</span><span class="o">,</span> <span
class="mi">0L</span><span class="o">,</span> <span class="s">"john"</span><span
class="o">,</span> <span class="s">"v_0"</span><span class="o">,</span> <span
class="mi">0</span><span class="o">),</span>
+ <span class="o">|</span> <span class="nc">Row</span><span
class="o">(</span><span class="s">"row_3"</span><span class="o">,</span> <span
class="s">"part_0"</span><span class="o">,</span> <span
class="mi">0L</span><span class="o">,</span> <span class="s">"tom"</span><span
class="o">,</span> <span class="s">"v_0"</span><span class="o">,</span> <span
class="mi">0</span><span class="o">))</span>
+ <span class="nl">data1:</span> <span class="nc">Seq</span><span
class="o">[</span><span class="n">org</span><span class="o">.</span><span
class="na">apache</span><span class="o">.</span><span
class="na">spark</span><span class="o">.</span><span class="na">sql</span><span
class="o">.</span><span class="na">Row</span><span class="o">]</span> <span
class="o">=</span> <span class="nc">List</span><span class="o">([</span><span
class="n">row_1</span><span class="o">,</span><span class="n"> [...]
+
+<span class="n">scala</span><span class="o">></span> <span
class="kt">var</span> <span class="n">dfFromData1</span> <span
class="o">=</span> <span class="n">spark</span><span class="o">.</span><span
class="na">createDataFrame</span><span class="o">(</span><span
class="n">data1</span><span class="o">,</span> <span
class="n">schema</span><span class="o">)</span>
+<span class="n">scala</span><span class="o">></span> <span
class="n">dfFromData1</span><span class="o">.</span><span
class="na">write</span><span class="o">.</span><span
class="na">format</span><span class="o">(</span><span
class="s">"hudi"</span><span class="o">).</span>
+ <span class="o">|</span> <span class="n">options</span><span
class="o">(</span><span class="n">getQuickstartWriteConfigs</span><span
class="o">).</span>
+ <span class="o">|</span> <span class="n">option</span><span
class="o">(</span><span class="no">PRECOMBINE_FIELD_OPT_KEY</span><span
class="o">.</span><span class="na">key</span><span class="o">,</span> <span
class="s">"preComb"</span><span class="o">).</span>
+ <span class="o">|</span> <span class="n">option</span><span
class="o">(</span><span class="no">RECORDKEY_FIELD_OPT_KEY</span><span
class="o">.</span><span class="na">key</span><span class="o">,</span> <span
class="s">"rowId"</span><span class="o">).</span>
+ <span class="o">|</span> <span class="n">option</span><span
class="o">(</span><span class="no">PARTITIONPATH_FIELD_OPT_KEY</span><span
class="o">.</span><span class="na">key</span><span class="o">,</span> <span
class="s">"partitionId"</span><span class="o">).</span>
+ <span class="o">|</span> <span class="n">option</span><span
class="o">(</span><span class="s">"hoodie.index.type"</span><span
class="o">,</span><span class="s">"SIMPLE"</span><span class="o">).</span>
+ <span class="o">|</span> <span class="n">option</span><span
class="o">(</span><span class="no">TABLE_NAME</span><span
class="o">.</span><span class="na">key</span><span class="o">,</span> <span
class="n">tableName</span><span class="o">).</span>
+ <span class="o">|</span> <span class="n">mode</span><span
class="o">(</span><span class="nc">Overwrite</span><span class="o">).</span>
+ <span class="o">|</span> <span class="n">save</span><span
class="o">(</span><span class="n">basePath</span><span class="o">)</span>
+
+<span class="n">scala</span><span class="o">></span> <span
class="kt">var</span> <span class="n">tripsSnapshotDF1</span> <span
class="o">=</span> <span class="n">spark</span><span class="o">.</span><span
class="na">read</span><span class="o">.</span><span
class="na">format</span><span class="o">(</span><span
class="s">"hudi"</span><span class="o">).</span><span
class="na">load</span><span class="o">(</span><span class="n">basePath</span>
<span class="o">+</span> <span class="s">"/*/*" [...]
+ <span class="nl">tripsSnapshotDF1:</span> <span class="n">org</span><span
class="o">.</span><span class="na">apache</span><span class="o">.</span><span
class="na">spark</span><span class="o">.</span><span class="na">sql</span><span
class="o">.</span><span class="na">DataFrame</span> <span class="o">=</span>
<span class="o">[</span><span class="nl">_hoodie_commit_time:</span> <span
class="n">string</span><span class="o">,</span> <span
class="nl">_hoodie_commit_seqno:</span> <span clas [...]
+
+<span class="n">scala</span><span class="o">></span> <span
class="n">tripsSnapshotDF1</span><span class="o">.</span><span
class="na">createOrReplaceTempView</span><span class="o">(</span><span
class="s">"hudi_trips_snapshot"</span><span class="o">)</span>
+
+<span class="n">scala</span><span class="o">></span> <span
class="n">spark</span><span class="o">.</span><span class="na">sql</span><span
class="o">(</span><span class="s">"desc hudi_trips_snapshot"</span><span
class="o">).</span><span class="na">show</span><span class="o">()</span>
+ <span class="o">+--------------------+---------+-------+</span>
+ <span class="o">|</span> <span class="n">col_name</span><span
class="o">|</span><span class="n">data_type</span><span class="o">|</span><span
class="n">comment</span><span class="o">|</span>
+ <span class="o">+--------------------+---------+-------+</span>
+ <span class="o">|</span> <span class="n">_hoodie_commit_time</span><span
class="o">|</span> <span class="n">string</span><span class="o">|</span>
<span class="kc">null</span><span class="o">|</span>
+ <span class="o">|</span><span class="n">_hoodie_commit_seqno</span><span
class="o">|</span> <span class="n">string</span><span class="o">|</span>
<span class="kc">null</span><span class="o">|</span>
+ <span class="o">|</span> <span class="n">_hoodie_record_key</span><span
class="o">|</span> <span class="n">string</span><span class="o">|</span>
<span class="kc">null</span><span class="o">|</span>
+ <span class="o">|</span><span class="n">_hoodie_partition</span><span
class="o">...|</span> <span class="n">string</span><span class="o">|</span>
<span class="kc">null</span><span class="o">|</span>
+ <span class="o">|</span> <span class="n">_hoodie_file_name</span><span
class="o">|</span> <span class="n">string</span><span class="o">|</span>
<span class="kc">null</span><span class="o">|</span>
+ <span class="o">|</span> <span class="n">rowId</span><span
class="o">|</span> <span class="n">string</span><span class="o">|</span>
<span class="kc">null</span><span class="o">|</span>
+ <span class="o">|</span> <span class="n">partitionId</span><span
class="o">|</span> <span class="n">string</span><span class="o">|</span>
<span class="kc">null</span><span class="o">|</span>
+ <span class="o">|</span> <span class="n">preComb</span><span
class="o">|</span> <span class="n">bigint</span><span class="o">|</span>
<span class="kc">null</span><span class="o">|</span>
+ <span class="o">|</span> <span class="n">name</span><span
class="o">|</span> <span class="n">string</span><span class="o">|</span>
<span class="kc">null</span><span class="o">|</span>
+ <span class="o">|</span> <span class="n">versionId</span><span
class="o">|</span> <span class="n">string</span><span class="o">|</span>
<span class="kc">null</span><span class="o">|</span>
+ <span class="o">|</span> <span class="n">intToLong</span><span
class="o">|</span> <span class="kt">int</span><span class="o">|</span>
<span class="kc">null</span><span class="o">|</span>
+ <span class="o">+--------------------+---------+-------+</span>
+
+<span class="n">scala</span><span class="o">></span> <span
class="n">spark</span><span class="o">.</span><span class="na">sql</span><span
class="o">(</span><span class="s">"select rowId, partitionId, preComb, name,
versionId, intToLong from hudi_trips_snapshot"</span><span
class="o">).</span><span class="na">show</span><span class="o">()</span>
+ <span class="o">+-----+-----------+-------+----+---------+---------+</span>
+ <span class="o">|</span><span class="n">rowId</span><span
class="o">|</span><span class="n">partitionId</span><span
class="o">|</span><span class="n">preComb</span><span class="o">|</span><span
class="n">name</span><span class="o">|</span><span
class="n">versionId</span><span class="o">|</span><span
class="n">intToLong</span><span class="o">|</span>
+ <span class="o">+-----+-----------+-------+----+---------+---------+</span>
+ <span class="o">|</span><span class="n">row_3</span><span
class="o">|</span> <span class="n">part_0</span><span class="o">|</span>
<span class="mi">0</span><span class="o">|</span> <span
class="n">tom</span><span class="o">|</span> <span
class="n">v_0</span><span class="o">|</span> <span
class="mi">0</span><span class="o">|</span>
+ <span class="o">|</span><span class="n">row_2</span><span
class="o">|</span> <span class="n">part_0</span><span class="o">|</span>
<span class="mi">0</span><span class="o">|</span><span
class="n">john</span><span class="o">|</span> <span
class="n">v_0</span><span class="o">|</span> <span
class="mi">0</span><span class="o">|</span>
+ <span class="o">|</span><span class="n">row_1</span><span
class="o">|</span> <span class="n">part_0</span><span class="o">|</span>
<span class="mi">0</span><span class="o">|</span> <span
class="n">bob</span><span class="o">|</span> <span
class="n">v_0</span><span class="o">|</span> <span
class="mi">0</span><span class="o">|</span>
+ <span class="o">+-----+-----------+-------+----+---------+---------+</span>
+
+<span class="c1">// In the new schema, we are going to add a String field and
</span>
+<span class="c1">// change the datatype `intToLong` field from int to
long.</span>
+<span class="n">scala</span><span class="o">></span> <span
class="n">val</span> <span class="n">newSchema</span> <span class="o">=</span>
<span class="nc">StructType</span><span class="o">(</span> <span
class="nc">Array</span><span class="o">(</span>
+ <span class="o">|</span> <span class="nc">StructField</span><span
class="o">(</span><span class="s">"rowId"</span><span class="o">,</span> <span
class="nc">StringType</span><span class="o">,</span><span
class="kc">true</span><span class="o">),</span>
+ <span class="o">|</span> <span class="nc">StructField</span><span
class="o">(</span><span class="s">"partitionId"</span><span class="o">,</span>
<span class="nc">StringType</span><span class="o">,</span><span
class="kc">true</span><span class="o">),</span>
+ <span class="o">|</span> <span class="nc">StructField</span><span
class="o">(</span><span class="s">"preComb"</span><span class="o">,</span>
<span class="nc">LongType</span><span class="o">,</span><span
class="kc">true</span><span class="o">),</span>
+ <span class="o">|</span> <span class="nc">StructField</span><span
class="o">(</span><span class="s">"name"</span><span class="o">,</span> <span
class="nc">StringType</span><span class="o">,</span><span
class="kc">true</span><span class="o">),</span>
+ <span class="o">|</span> <span class="nc">StructField</span><span
class="o">(</span><span class="s">"versionId"</span><span class="o">,</span>
<span class="nc">StringType</span><span class="o">,</span><span
class="kc">true</span><span class="o">),</span>
+ <span class="o">|</span> <span class="nc">StructField</span><span
class="o">(</span><span class="s">"intToLong"</span><span class="o">,</span>
<span class="nc">LongType</span><span class="o">,</span><span
class="kc">true</span><span class="o">),</span>
+ <span class="o">|</span> <span class="nc">StructField</span><span
class="o">(</span><span class="s">"newField"</span><span class="o">,</span>
<span class="nc">StringType</span><span class="o">,</span><span
class="kc">true</span><span class="o">)</span>
+ <span class="o">|</span> <span class="o">))</span>
+ <span class="nl">newSchema:</span> <span class="n">org</span><span
class="o">.</span><span class="na">apache</span><span class="o">.</span><span
class="na">spark</span><span class="o">.</span><span class="na">sql</span><span
class="o">.</span><span class="na">types</span><span class="o">.</span><span
class="na">StructType</span> <span class="o">=</span> <span
class="nc">StructType</span><span class="o">(</span><span
class="nc">StructField</span><span class="o">(</span><span class="n" [...]
+
+<span class="n">scala</span><span class="o">></span> <span
class="n">val</span> <span class="n">data2</span> <span class="o">=</span>
<span class="nc">Seq</span><span class="o">(</span><span
class="nc">Row</span><span class="o">(</span><span
class="s">"row_2"</span><span class="o">,</span> <span
class="s">"part_0"</span><span class="o">,</span> <span
class="mi">5L</span><span class="o">,</span> <span class="s">"john"</span><span
class="o">,</span> <span class="s">"v_3"</span><span cla [...]
+ <span class="o">|</span> <span class="nc">Row</span><span
class="o">(</span><span class="s">"row_5"</span><span class="o">,</span> <span
class="s">"part_0"</span><span class="o">,</span> <span
class="mi">5L</span><span class="o">,</span> <span
class="s">"maroon"</span><span class="o">,</span> <span
class="s">"v_2"</span><span class="o">,</span> <span class="mi">2L</span><span
class="o">,</span> <span class="s">"newField_1"</span><span class="o">),</span>
+ <span class="o">|</span> <span class="nc">Row</span><span
class="o">(</span><span class="s">"row_9"</span><span class="o">,</span> <span
class="s">"part_0"</span><span class="o">,</span> <span
class="mi">5L</span><span class="o">,</span> <span
class="s">"michael"</span><span class="o">,</span> <span
class="s">"v_2"</span><span class="o">,</span> <span class="mi">2L</span><span
class="o">,</span> <span class="s">"newField_1"</span><span class="o">))</span>
+ <span class="nl">data2:</span> <span class="nc">Seq</span><span
class="o">[</span><span class="n">org</span><span class="o">.</span><span
class="na">apache</span><span class="o">.</span><span
class="na">spark</span><span class="o">.</span><span class="na">sql</span><span
class="o">.</span><span class="na">Row</span><span class="o">]</span> <span
class="o">=</span> <span class="nc">List</span><span class="o">([</span><span
class="n">row_2</span><span class="o">,</span><span class="n"> [...]
+
+<span class="n">scala</span><span class="o">></span> <span
class="kt">var</span> <span class="n">dfFromData2</span> <span
class="o">=</span> <span class="n">spark</span><span class="o">.</span><span
class="na">createDataFrame</span><span class="o">(</span><span
class="n">data2</span><span class="o">,</span> <span
class="n">newSchema</span><span class="o">)</span>
+<span class="n">scala</span><span class="o">></span> <span
class="n">dfFromData2</span><span class="o">.</span><span
class="na">write</span><span class="o">.</span><span
class="na">format</span><span class="o">(</span><span
class="s">"hudi"</span><span class="o">).</span>
+ <span class="o">|</span> <span class="n">options</span><span
class="o">(</span><span class="n">getQuickstartWriteConfigs</span><span
class="o">).</span>
+ <span class="o">|</span> <span class="n">option</span><span
class="o">(</span><span class="no">PRECOMBINE_FIELD_OPT_KEY</span><span
class="o">.</span><span class="na">key</span><span class="o">,</span> <span
class="s">"preComb"</span><span class="o">).</span>
+ <span class="o">|</span> <span class="n">option</span><span
class="o">(</span><span class="no">RECORDKEY_FIELD_OPT_KEY</span><span
class="o">.</span><span class="na">key</span><span class="o">,</span> <span
class="s">"rowId"</span><span class="o">).</span>
+ <span class="o">|</span> <span class="n">option</span><span
class="o">(</span><span class="no">PARTITIONPATH_FIELD_OPT_KEY</span><span
class="o">.</span><span class="na">key</span><span class="o">,</span> <span
class="s">"partitionId"</span><span class="o">).</span>
+ <span class="o">|</span> <span class="n">option</span><span
class="o">(</span><span class="s">"hoodie.index.type"</span><span
class="o">,</span><span class="s">"SIMPLE"</span><span class="o">).</span>
+ <span class="o">|</span> <span class="n">option</span><span
class="o">(</span><span class="no">TABLE_NAME</span><span
class="o">.</span><span class="na">key</span><span class="o">,</span> <span
class="n">tableName</span><span class="o">).</span>
+ <span class="o">|</span> <span class="n">mode</span><span
class="o">(</span><span class="nc">Append</span><span class="o">).</span>
+ <span class="o">|</span> <span class="n">save</span><span
class="o">(</span><span class="n">basePath</span><span class="o">)</span>
+
+<span class="n">scala</span><span class="o">></span> <span
class="kt">var</span> <span class="n">tripsSnapshotDF2</span> <span
class="o">=</span> <span class="n">spark</span><span class="o">.</span><span
class="na">read</span><span class="o">.</span><span
class="na">format</span><span class="o">(</span><span
class="s">"hudi"</span><span class="o">).</span><span
class="na">load</span><span class="o">(</span><span class="n">basePath</span>
<span class="o">+</span> <span class="s">"/*/*" [...]
+ <span class="nl">tripsSnapshotDF2:</span> <span class="n">org</span><span
class="o">.</span><span class="na">apache</span><span class="o">.</span><span
class="na">spark</span><span class="o">.</span><span class="na">sql</span><span
class="o">.</span><span class="na">DataFrame</span> <span class="o">=</span>
<span class="o">[</span><span class="nl">_hoodie_commit_time:</span> <span
class="n">string</span><span class="o">,</span> <span
class="nl">_hoodie_commit_seqno:</span> <span clas [...]
+
+<span class="n">scala</span><span class="o">></span> <span
class="n">tripsSnapshotDF2</span><span class="o">.</span><span
class="na">createOrReplaceTempView</span><span class="o">(</span><span
class="s">"hudi_trips_snapshot"</span><span class="o">)</span>
+
+<span class="n">scala</span><span class="o">></span> <span
class="n">spark</span><span class="o">.</span><span class="na">sql</span><span
class="o">(</span><span class="s">"desc hudi_trips_snapshot"</span><span
class="o">).</span><span class="na">show</span><span class="o">()</span>
+ <span class="o">+--------------------+---------+-------+</span>
+ <span class="o">|</span> <span class="n">col_name</span><span
class="o">|</span><span class="n">data_type</span><span class="o">|</span><span
class="n">comment</span><span class="o">|</span>
+ <span class="o">+--------------------+---------+-------+</span>
+ <span class="o">|</span> <span class="n">_hoodie_commit_time</span><span
class="o">|</span> <span class="n">string</span><span class="o">|</span>
<span class="kc">null</span><span class="o">|</span>
+ <span class="o">|</span><span class="n">_hoodie_commit_seqno</span><span
class="o">|</span> <span class="n">string</span><span class="o">|</span>
<span class="kc">null</span><span class="o">|</span>
+ <span class="o">|</span> <span class="n">_hoodie_record_key</span><span
class="o">|</span> <span class="n">string</span><span class="o">|</span>
<span class="kc">null</span><span class="o">|</span>
+ <span class="o">|</span><span class="n">_hoodie_partition</span><span
class="o">...|</span> <span class="n">string</span><span class="o">|</span>
<span class="kc">null</span><span class="o">|</span>
+ <span class="o">|</span> <span class="n">_hoodie_file_name</span><span
class="o">|</span> <span class="n">string</span><span class="o">|</span>
<span class="kc">null</span><span class="o">|</span>
+ <span class="o">|</span> <span class="n">rowId</span><span
class="o">|</span> <span class="n">string</span><span class="o">|</span>
<span class="kc">null</span><span class="o">|</span>
+ <span class="o">|</span> <span class="n">partitionId</span><span
class="o">|</span> <span class="n">string</span><span class="o">|</span>
<span class="kc">null</span><span class="o">|</span>
+ <span class="o">|</span> <span class="n">preComb</span><span
class="o">|</span> <span class="n">bigint</span><span class="o">|</span>
<span class="kc">null</span><span class="o">|</span>
+ <span class="o">|</span> <span class="n">name</span><span
class="o">|</span> <span class="n">string</span><span class="o">|</span>
<span class="kc">null</span><span class="o">|</span>
+ <span class="o">|</span> <span class="n">versionId</span><span
class="o">|</span> <span class="n">string</span><span class="o">|</span>
<span class="kc">null</span><span class="o">|</span>
+ <span class="o">|</span> <span class="n">intToLong</span><span
class="o">|</span> <span class="n">bigint</span><span class="o">|</span>
<span class="kc">null</span><span class="o">|</span>
+ <span class="o">|</span> <span class="n">newField</span><span
class="o">|</span> <span class="n">string</span><span class="o">|</span>
<span class="kc">null</span><span class="o">|</span>
+ <span class="o">+--------------------+---------+-------+</span>
+
+
+<span class="n">scala</span><span class="o">></span> <span
class="n">spark</span><span class="o">.</span><span class="na">sql</span><span
class="o">(</span><span class="s">"select rowId, partitionId, preComb, name,
versionId, intToLong, newField from hudi_trips_snapshot"</span><span
class="o">).</span><span class="na">show</span><span class="o">()</span>
+ <span
class="o">+-----+-----------+-------+-------+---------+---------+----------+</span>
+ <span class="o">|</span><span class="n">rowId</span><span
class="o">|</span><span class="n">partitionId</span><span
class="o">|</span><span class="n">preComb</span><span class="o">|</span>
<span class="n">name</span><span class="o">|</span><span
class="n">versionId</span><span class="o">|</span><span
class="n">intToLong</span><span class="o">|</span> <span
class="n">newField</span><span class="o">|</span>
+ <span
class="o">+-----+-----------+-------+-------+---------+---------+----------+</span>
+ <span class="o">|</span><span class="n">row_3</span><span
class="o">|</span> <span class="n">part_0</span><span class="o">|</span>
<span class="mi">0</span><span class="o">|</span> <span
class="n">tom</span><span class="o">|</span> <span
class="n">v_0</span><span class="o">|</span> <span
class="mi">0</span><span class="o">|</span> <span
class="kc">null</span><span class="o">|</span>
+ <span class="o">|</span><span class="n">row_2</span><span
class="o">|</span> <span class="n">part_0</span><span class="o">|</span>
<span class="mi">5</span><span class="o">|</span> <span
class="n">john</span><span class="o">|</span> <span
class="n">v_3</span><span class="o">|</span> <span
class="mi">3</span><span class="o">|</span><span
class="n">newField_1</span><span class="o">|</span>
+ <span class="o">|</span><span class="n">row_1</span><span
class="o">|</span> <span class="n">part_0</span><span class="o">|</span>
<span class="mi">0</span><span class="o">|</span> <span
class="n">bob</span><span class="o">|</span> <span
class="n">v_0</span><span class="o">|</span> <span
class="mi">0</span><span class="o">|</span> <span
class="kc">null</span><span class="o">|</span>
+ <span class="o">|</span><span class="n">row_5</span><span
class="o">|</span> <span class="n">part_0</span><span class="o">|</span>
<span class="mi">5</span><span class="o">|</span> <span
class="n">maroon</span><span class="o">|</span> <span
class="n">v_2</span><span class="o">|</span> <span
class="mi">2</span><span class="o">|</span><span
class="n">newField_1</span><span class="o">|</span>
+ <span class="o">|</span><span class="n">row_9</span><span
class="o">|</span> <span class="n">part_0</span><span class="o">|</span>
<span class="mi">5</span><span class="o">|</span><span
class="n">michael</span><span class="o">|</span> <span
class="n">v_2</span><span class="o">|</span> <span
class="mi">2</span><span class="o">|</span><span
class="n">newField_1</span><span class="o">|</span>
+ <span
class="o">+-----+-----------+-------+-------+---------+---------+----------+</span>
+
+</code></pre></div></div>
+
</section>
<a href="#masthead__inner-wrap" class="back-to-top">Back to top
↑</a>