This is an automated email from the ASF dual-hosted git repository.
vinoth pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/hudi.git
The following commit(s) were added to refs/heads/asf-site by this push:
new bb21468 Travis CI build asf-site
bb21468 is described below
commit bb2146874252a090c77f17bce7207402d30946d0
Author: CI <[email protected]>
AuthorDate: Thu Feb 4 19:49:01 2021 +0000
Travis CI build asf-site
---
content/docs/configurations.html | 14 +++++++-------
content/docs/docker_demo.html | 4 ++--
content/docs/writing_data.html | 6 +++---
3 files changed, 12 insertions(+), 12 deletions(-)
diff --git a/content/docs/configurations.html b/content/docs/configurations.html
index 63ca359..240179d 100644
--- a/content/docs/configurations.html
+++ b/content/docs/configurations.html
@@ -627,13 +627,17 @@ HoodieWriteConfig can be built using a builder pattern as
below.</p>
<h4 id="bloom-index-configs">Bloom Index configs</h4>
+<h4 id="bloomIndexFilterType">bloomIndexFilterType(bucketizedChecking =
BloomFilterTypeCode.SIMPLE)</h4>
+<p>Property: <code
class="highlighter-rouge">hoodie.bloom.index.filter.type</code> <br />
+<span style="color:grey">Filter type used. Default is
BloomFilterTypeCode.SIMPLE. Available values are [BloomFilterTypeCode.SIMPLE ,
BloomFilterTypeCode.DYNAMIC_V0]. Dynamic bloom filters auto size themselves
based on number of keys.</span></p>
+
<h4 id="bloomFilterNumEntries">bloomFilterNumEntries(numEntries = 60000)</h4>
<p>Property: <code
class="highlighter-rouge">hoodie.index.bloom.num_entries</code> <br />
-<span style="color:grey">Only applies if index type is BLOOM. <br />This is
the number of entries to be stored in the bloom filter. We assume the
maxParquetFileSize is 128MB and averageRecordSize is 1024B and hence we approx
a total of 130K records in a file. The default (60000) is roughly half of this
approximation. <a
href="https://issues.apache.org/jira/browse/HUDI-56">HUDI-56</a> tracks
computing this dynamically. Warning: Setting this very low, will generate a lot
of false positives [...]
+<span style="color:grey">Only applies if index type is BLOOM. <br />This is
the number of entries to be stored in the bloom filter. We assume the
maxParquetFileSize is 128MB and averageRecordSize is 1024B and hence we approx
a total of 130K records in a file. The default (60000) is roughly half of this
approximation. <a
href="https://issues.apache.org/jira/browse/HUDI-56">HUDI-56</a> tracks
computing this dynamically. Warning: Setting this very low, will generate a lot
of false positives [...]
<h4 id="bloomFilterFPP">bloomFilterFPP(fpp = 0.000000001)</h4>
<p>Property: <code class="highlighter-rouge">hoodie.index.bloom.fpp</code> <br
/>
-<span style="color:grey">Only applies if index type is BLOOM. <br /> Error
rate allowed given the number of entries. This is used to calculate how many
bits should be assigned for the bloom filter and the number of hash functions.
This is usually set very low (default: 0.000000001), we like to tradeoff disk
space for lower false positives</span></p>
+<span style="color:grey">Only applies if index type is BLOOM. <br /> Error
rate allowed given the number of entries. This is used to calculate how many
bits should be assigned for the bloom filter and the number of hash functions.
This is usually set very low (default: 0.000000001), we like to tradeoff disk
space for lower false positives. If the number of entries added to bloom filter
exceeds the congfigured value (<code
class="highlighter-rouge">hoodie.index.bloom.num_entries</code>), [...]
<h4 id="bloomIndexParallelism">bloomIndexParallelism(0)</h4>
<p>Property: <code
class="highlighter-rouge">hoodie.bloom.index.parallelism</code> <br />
@@ -641,7 +645,7 @@ HoodieWriteConfig can be built using a builder pattern as
below.</p>
<h4 id="bloomIndexPruneByRanges">bloomIndexPruneByRanges(pruneRanges =
true)</h4>
<p>Property: <code
class="highlighter-rouge">hoodie.bloom.index.prune.by.ranges</code> <br />
-<span style="color:grey">Only applies if index type is BLOOM. <br /> When
true, range information from files to leveraged speed up index lookups.
Particularly helpful, if the key has a monotonously increasing prefix, such as
timestamp.</span></p>
+<span style="color:grey">Only applies if index type is BLOOM. <br /> When
true, range information from files to leveraged speed up index lookups.
Particularly helpful, if the key has a monotonously increasing prefix, such as
timestamp. If the record key is completely random, it is better to turn this
off.</span></p>
<h4 id="bloomIndexUseCaching">bloomIndexUseCaching(useCaching = true)</h4>
<p>Property: <code
class="highlighter-rouge">hoodie.bloom.index.use.caching</code> <br />
@@ -655,10 +659,6 @@ HoodieWriteConfig can be built using a builder pattern as
below.</p>
<p>Property: <code
class="highlighter-rouge">hoodie.bloom.index.bucketized.checking</code> <br />
<span style="color:grey">Only applies if index type is BLOOM. <br /> When
true, bucketized bloom filtering is enabled. This reduces skew seen in sort
based bloom index lookup</span></p>
-<h4 id="bloomIndexFilterType">bloomIndexFilterType(bucketizedChecking =
BloomFilterTypeCode.SIMPLE)</h4>
-<p>Property: <code
class="highlighter-rouge">hoodie.bloom.index.filter.type</code> <br />
-<span style="color:grey">Filter type used. Default is
BloomFilterTypeCode.SIMPLE. Available values are [BloomFilterTypeCode.SIMPLE ,
BloomFilterTypeCode.DYNAMIC_V0]. Dynamic bloom filters auto size themselves
based on number of keys</span></p>
-
<h4
id="bloomIndexFilterDynamicMaxEntries">bloomIndexFilterDynamicMaxEntries(maxNumberOfEntries
= 100000)</h4>
<p>Property: <code
class="highlighter-rouge">hoodie.bloom.index.filter.dynamic.max.entries</code>
<br />
<span style="color:grey">The threshold for the maximum number of keys to
record in a dynamic Bloom filter row. Only applies if filter type is
BloomFilterTypeCode.DYNAMIC_V0.</span></p>
diff --git a/content/docs/docker_demo.html b/content/docs/docker_demo.html
index 516f54f..66f3dfc 100644
--- a/content/docs/docker_demo.html
+++ b/content/docs/docker_demo.html
@@ -573,7 +573,7 @@ inorder to run Hive queries against those tables.</p>
<div class="language-java highlighter-rouge"><div class="highlight"><pre
class="highlight"><code><span class="n">docker</span> <span
class="n">exec</span> <span class="o">-</span><span class="n">it</span> <span
class="n">adhoc</span><span class="o">-</span><span class="mi">2</span> <span
class="o">/</span><span class="n">bin</span><span class="o">/</span><span
class="n">bash</span>
<span class="err">#</span> <span class="nc">THis</span> <span
class="n">command</span> <span class="n">takes</span> <span class="n">in</span>
<span class="nc">HIveServer</span> <span class="no">URL</span> <span
class="n">and</span> <span class="no">COW</span> <span class="nc">Hudi</span>
<span class="n">table</span> <span class="n">location</span> <span
class="n">in</span> <span class="no">HDFS</span> <span class="n">and</span>
<span class="n">sync</span> <span class="n">the</span> <span [...]
-<span class="o">/</span><span class="kt">var</span><span
class="o">/</span><span class="n">hoodie</span><span class="o">/</span><span
class="n">ws</span><span class="o">/</span><span class="n">hudi</span><span
class="o">-</span><span class="n">hive</span><span class="o">-</span><span
class="n">sync</span><span class="o">/</span><span
class="n">run_sync_tool</span><span class="o">.</span><span
class="na">sh</span> <span class="err">\</span>
+<span class="o">/</span><span class="kt">var</span><span
class="o">/</span><span class="n">hoodie</span><span class="o">/</span><span
class="n">ws</span><span class="o">/</span><span class="n">hudi</span><span
class="o">-</span><span class="n">sync</span><span class="o">/</span><span
class="n">hudi</span><span class="o">-</span><span class="n">hive</span><span
class="o">-</span><span class="n">sync</span><span class="o">/</span><span
class="n">run_sync_tool</span><span class="o">.</span> [...]
<span class="o">--</span><span class="n">jdbc</span><span
class="o">-</span><span class="n">url</span> <span
class="nl">jdbc:hive2:</span><span class="c1">//hiveserver:10000 \</span>
<span class="o">--</span><span class="n">user</span> <span
class="n">hive</span> <span class="err">\</span>
<span class="o">--</span><span class="n">pass</span> <span
class="n">hive</span> <span class="err">\</span>
@@ -586,7 +586,7 @@ inorder to run Hive queries against those tables.</p>
<span class="o">.....</span>
<span class="err">#</span> <span class="nc">Now</span> <span
class="n">run</span> <span class="n">hive</span><span class="o">-</span><span
class="n">sync</span> <span class="k">for</span> <span class="n">the</span>
<span class="n">second</span> <span class="n">data</span><span
class="o">-</span><span class="n">set</span> <span class="n">in</span> <span
class="no">HDFS</span> <span class="n">using</span> <span
class="nc">Merge</span><span class="o">-</span><span class="nc">On</span><span
[...]
-<span class="o">/</span><span class="kt">var</span><span
class="o">/</span><span class="n">hoodie</span><span class="o">/</span><span
class="n">ws</span><span class="o">/</span><span class="n">hudi</span><span
class="o">-</span><span class="n">hive</span><span class="o">-</span><span
class="n">sync</span><span class="o">/</span><span
class="n">run_sync_tool</span><span class="o">.</span><span
class="na">sh</span> <span class="err">\</span>
+<span class="o">/</span><span class="kt">var</span><span
class="o">/</span><span class="n">hoodie</span><span class="o">/</span><span
class="n">ws</span><span class="o">/</span><span class="n">hudi</span><span
class="o">-</span><span class="n">sync</span><span class="o">/</span><span
class="n">hudi</span><span class="o">-</span><span class="n">hive</span><span
class="o">-</span><span class="n">sync</span><span class="o">/</span><span
class="n">run_sync_tool</span><span class="o">.</span> [...]
<span class="o">--</span><span class="n">jdbc</span><span
class="o">-</span><span class="n">url</span> <span
class="nl">jdbc:hive2:</span><span class="c1">//hiveserver:10000 \</span>
<span class="o">--</span><span class="n">user</span> <span
class="n">hive</span> <span class="err">\</span>
<span class="o">--</span><span class="n">pass</span> <span
class="n">hive</span> <span class="err">\</span>
diff --git a/content/docs/writing_data.html b/content/docs/writing_data.html
index 8475a3e..0ab4f2b 100644
--- a/content/docs/writing_data.html
+++ b/content/docs/writing_data.html
@@ -556,13 +556,13 @@ provided under <code
class="highlighter-rouge">hudi-utilities/src/test/resources
<p><strong><code
class="highlighter-rouge">DataSourceWriteOptions</code></strong>:</p>
-<p><strong>RECORDKEY_FIELD_OPT_KEY</strong> (Required): Primary key field(s).
Nested fields can be specified using the dot notation eg: <code
class="highlighter-rouge">a.b.c</code>. When using multiple columns as primary
key use comma separated notation, eg: <code
class="highlighter-rouge">"col1,col2,col3,etc"</code>. Single or multiple
columns as primary key specified by <code
class="highlighter-rouge">KEYGENERATOR_CLASS_OPT_KEY</code> property.<br />
+<p><strong>RECORDKEY_FIELD_OPT_KEY</strong> (Required): Primary key field(s).
Record keys uniquely identify a record/row within each partition. If one wants
to have a global uniqueness, there are two options. You could either make the
dataset non-partitioned, or, you can leverage Global indexes to ensure record
keys are unique irrespective of the partition path. Record keys can either be a
single column or refer to multiple columns. <code
class="highlighter-rouge">KEYGENERATOR_CLASS_OPT_ [...]
Default value: <code class="highlighter-rouge">"uuid"</code><br /></p>
-<p><strong>PARTITIONPATH_FIELD_OPT_KEY</strong> (Required): Columns to be used
for partitioning the table. To prevent partitioning, provide empty string as
value eg: <code class="highlighter-rouge">""</code>. Specify partitioning/no
partitioning using <code
class="highlighter-rouge">KEYGENERATOR_CLASS_OPT_KEY</code>. If synchronizing
to hive, also specify using <code
class="highlighter-rouge">HIVE_PARTITION_EXTRACTOR_CLASS_OPT_KEY.</code><br />
+<p><strong>PARTITIONPATH_FIELD_OPT_KEY</strong> (Required): Columns to be used
for partitioning the table. To prevent partitioning, provide empty string as
value eg: <code class="highlighter-rouge">""</code>. Specify partitioning/no
partitioning using <code
class="highlighter-rouge">KEYGENERATOR_CLASS_OPT_KEY</code>. If partition path
needs to be url encoded, you can set <code
class="highlighter-rouge">URL_ENCODE_PARTITIONING_OPT_KEY</code>. If
synchronizing to hive, also specify using < [...]
Default value: <code class="highlighter-rouge">"partitionpath"</code><br /></p>
-<p><strong>PRECOMBINE_FIELD_OPT_KEY</strong> (Required): When two records have
the same key value, the record with the largest value from the field specified
will be choosen.<br />
+<p><strong>PRECOMBINE_FIELD_OPT_KEY</strong> (Required): When two records
within the same batch have the same key value, the record with the largest
value from the field specified will be choosen. If you are using default
payload of OverwriteWithLatestAvroPayload for HoodieRecordPayload (<code
class="highlighter-rouge">WRITE_PAYLOAD_CLASS</code>), an incoming record will
always takes precendence compared to the one in storage ignoring this <code
class="highlighter-rouge">PRECOMBINE_FIELD [...]
Default value: <code class="highlighter-rouge">"ts"</code><br /></p>
<p><strong>OPERATION_OPT_KEY</strong>: The <a href="#write-operations">write
operations</a> to use.<br />