This is an automated email from the ASF dual-hosted git repository.
vinoth pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/hudi.git
The following commit(s) were added to refs/heads/asf-site by this push:
new b3d0f15 Travis CI build asf-site
b3d0f15 is described below
commit b3d0f15d447212ca72b0fda79a687a618196f6f5
Author: CI <[email protected]>
AuthorDate: Thu Aug 13 06:43:57 2020 +0000
Travis CI build asf-site
---
content/docs/writing_data.html | 34 ++++++++++++++++++++++++++++++++++
1 file changed, 34 insertions(+)
diff --git a/content/docs/writing_data.html b/content/docs/writing_data.html
index e229106..d18be96 100644
--- a/content/docs/writing_data.html
+++ b/content/docs/writing_data.html
@@ -368,6 +368,7 @@
<ul class="toc__menu">
<li><a href="#write-operations">Write Operations</a></li>
<li><a href="#deltastreamer">DeltaStreamer</a></li>
+ <li><a href="#multitabledeltastreamer">MultiTableDeltaStreamer</a></li>
<li><a href="#datasource-writer">Datasource Writer</a></li>
<li><a href="#syncing-to-hive">Syncing to Hive</a></li>
<li><a href="#deletes">Deletes</a></li>
@@ -541,6 +542,39 @@ provided under <code
class="highlighter-rouge">hudi-utilities/src/test/resources
<p>In some cases, you may want to migrate your existing table into Hudi
beforehand. Please refer to <a href="/docs/migration_guide.html">migration
guide</a>.</p>
+<h2 id="multitabledeltastreamer">MultiTableDeltaStreamer</h2>
+
+<p><code class="highlighter-rouge">HoodieMultiTableDeltaStreamer</code>, a
wrapper on top of <code class="highlighter-rouge">HoodieDeltaStreamer</code>,
enables one to ingest multiple tables at a single go into hudi datasets.
Currently it only supports sequential processing of tables to be ingested and
COPY_ON_WRITE storage type. The command line options for <code
class="highlighter-rouge">HoodieMultiTableDeltaStreamer</code> are pretty much
similar to <code class="highlighter-rouge">Hoo [...]
+
+<div class="language-java highlighter-rouge"><div class="highlight"><pre
class="highlight"><code> <span class="o">*</span> <span
class="o">--</span><span class="n">config</span><span class="o">-</span><span
class="n">folder</span>
+ <span class="n">the</span> <span class="n">path</span> <span
class="n">to</span> <span class="n">the</span> <span class="n">folder</span>
<span class="n">which</span> <span class="n">contains</span> <span
class="n">all</span> <span class="n">the</span> <span class="n">table</span>
<span class="n">wise</span> <span class="n">config</span> <span
class="n">files</span>
+ <span class="o">--</span><span class="n">base</span><span
class="o">-</span><span class="n">path</span><span class="o">-</span><span
class="n">prefix</span>
+ <span class="k">this</span> <span class="n">is</span> <span
class="n">added</span> <span class="n">to</span> <span class="n">enable</span>
<span class="n">users</span> <span class="n">to</span> <span
class="n">create</span> <span class="n">all</span> <span class="n">the</span>
<span class="n">hudi</span> <span class="n">datasets</span> <span
class="k">for</span> <span class="n">related</span> <span
class="n">tables</span> <span class="n">under</span> <span class="n">one</span>
<span [...]
+</code></pre></div></div>
+
+<p>The following properties are needed to be set properly to ingest data using
<code class="highlighter-rouge">HoodieMultiTableDeltaStreamer</code>.</p>
+
+<div class="language-java highlighter-rouge"><div class="highlight"><pre
class="highlight"><code><span class="n">hoodie</span><span
class="o">.</span><span class="na">deltastreamer</span><span
class="o">.</span><span class="na">ingestion</span><span
class="o">.</span><span class="na">tablesToBeIngested</span>
+ <span class="n">comma</span> <span class="n">separated</span> <span
class="n">names</span> <span class="n">of</span> <span class="n">tables</span>
<span class="n">to</span> <span class="n">be</span> <span
class="n">ingested</span> <span class="n">in</span> <span class="n">the</span>
<span class="n">format</span> <span class="o"><</span><span
class="n">database</span><span class="o">>.<</span><span
class="n">table</span><span class="o">>,</span> <span class="k">for</span>
<s [...]
+<span class="n">hoodie</span><span class="o">.</span><span
class="na">deltastreamer</span><span class="o">.</span><span
class="na">ingestion</span><span class="o">.</span><span
class="na">targetBasePath</span>
+ <span class="k">if</span> <span class="n">you</span> <span
class="n">wish</span> <span class="n">to</span> <span class="n">ingest</span>
<span class="n">a</span> <span class="n">particular</span> <span
class="n">table</span> <span class="n">in</span> <span class="n">a</span> <span
class="n">separate</span> <span class="n">path</span><span class="o">,</span>
<span class="n">you</span> <span class="n">can</span> <span
class="n">mention</span> <span class="n">that</span> <span class="n">p [...]
+<span class="n">hoodie</span><span class="o">.</span><span
class="na">deltastreamer</span><span class="o">.</span><span
class="na">ingestion</span><span class="o">.<</span><span
class="n">database</span><span class="o">>.<</span><span
class="n">table</span><span class="o">>.</span><span
class="na">configFile</span>
+ <span class="n">path</span> <span class="n">to</span> <span
class="n">the</span> <span class="n">config</span> <span class="n">file</span>
<span class="n">in</span> <span class="n">dedicated</span> <span
class="n">config</span> <span class="n">folder</span> <span
class="n">which</span> <span class="n">contains</span> <span
class="n">table</span> <span class="n">overridden</span> <span
class="n">properties</span> <span class="k">for</span> <span
class="n">the</span> <span class="n">part [...]
+</code></pre></div></div>
+
+<p>Sample config files for table wise overridden properties can be found under
<code
class="highlighter-rouge">hudi-utilities/src/test/resources/delta-streamer-config</code>.
The command to run <code
class="highlighter-rouge">HoodieMultiTableDeltaStreamer</code> is also similar
to how you run <code class="highlighter-rouge">HoodieDeltaStreamer</code>.</p>
+
+<div class="language-java highlighter-rouge"><div class="highlight"><pre
class="highlight"><code><span class="o">[</span><span
class="n">hoodie</span><span class="o">]</span><span class="err">$</span> <span
class="n">spark</span><span class="o">-</span><span class="n">submit</span>
<span class="o">--</span><span class="kd">class</span> <span
class="nc">org</span><span class="o">.</span><span
class="na">apache</span><span class="o">.</span><span
class="na">hudi</span><span class="o">.</sp [...]
+ <span class="o">--</span><span class="n">props</span> <span
class="nl">file:</span><span
class="c1">//${PWD}/hudi-utilities/src/test/resources/delta-streamer-config/kafka-source.properties
\</span>
+ <span class="o">--</span><span class="n">config</span><span
class="o">-</span><span class="n">folder</span> <span
class="nl">file:</span><span class="c1">//tmp/hudi-ingestion-config \</span>
+ <span class="o">--</span><span class="n">schemaprovider</span><span
class="o">-</span><span class="kd">class</span> <span
class="nc">org</span><span class="o">.</span><span
class="na">apache</span><span class="o">.</span><span
class="na">hudi</span><span class="o">.</span><span
class="na">utilities</span><span class="o">.</span><span
class="na">schema</span><span class="o">.</span><span
class="na">SchemaRegistryProvider</span> <span class="err">\</span>
+ <span class="o">--</span><span class="n">source</span><span
class="o">-</span><span class="kd">class</span> <span
class="nc">org</span><span class="o">.</span><span
class="na">apache</span><span class="o">.</span><span
class="na">hudi</span><span class="o">.</span><span
class="na">utilities</span><span class="o">.</span><span
class="na">sources</span><span class="o">.</span><span
class="na">AvroKafkaSource</span> <span class="err">\</span>
+ <span class="o">--</span><span class="n">source</span><span
class="o">-</span><span class="n">ordering</span><span class="o">-</span><span
class="n">field</span> <span class="n">impresssiontime</span> <span
class="err">\</span>
+ <span class="o">--</span><span class="n">base</span><span
class="o">-</span><span class="n">path</span><span class="o">-</span><span
class="n">prefix</span> <span class="nl">file:</span><span
class="err">\</span><span class="o">/</span><span class="err">\</span><span
class="o">/</span><span class="err">\</span><span class="o">/</span><span
class="n">tmp</span><span class="o">/</span><span class="n">hudi</span><span
class="o">-</span><span class="n">deltastreamer</span><span class="o">- [...]
+ <span class="o">--</span><span class="n">target</span><span
class="o">-</span><span class="n">table</span> <span class="n">uber</span><span
class="o">.</span><span class="na">impressions</span> <span class="err">\</span>
+ <span class="o">--</span><span class="n">op</span> <span
class="no">BULK_INSERT</span>
+</code></pre></div></div>
+
<h2 id="datasource-writer">Datasource Writer</h2>
<p>The <code class="highlighter-rouge">hudi-spark</code> module offers the
DataSource API to write (and read) a Spark DataFrame into a Hudi table. There
are a number of options available:</p>