[hudi] branch asf-site updated: Travis CI build asf-site

vinoth Thu, 04 Feb 2021 11:49:34 -0800

This is an automated email from the ASF dual-hosted git repository.

vinoth pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/hudi.git



The following commit(s) were added to refs/heads/asf-site by this push:
     new bb21468  Travis CI build asf-site
bb21468 is described below

commit bb2146874252a090c77f17bce7207402d30946d0
Author: CI <[email protected]>
AuthorDate: Thu Feb 4 19:49:01 2021 +0000

    Travis CI build asf-site
---
 content/docs/configurations.html | 14 +++++++-------
 content/docs/docker_demo.html    |  4 ++--
 content/docs/writing_data.html   |  6 +++---
 3 files changed, 12 insertions(+), 12 deletions(-)

diff --git a/content/docs/configurations.html b/content/docs/configurations.html
index 63ca359..240179d 100644
--- a/content/docs/configurations.html
+++ b/content/docs/configurations.html
@@ -627,13 +627,17 @@ HoodieWriteConfig can be built using a builder pattern as 
below.</p>
 
 <h4 id="bloom-index-configs">Bloom Index configs</h4>
 
+<h4 id="bloomIndexFilterType">bloomIndexFilterType(bucketizedChecking = 
BloomFilterTypeCode.SIMPLE)</h4>
+<p>Property: <code 
class="highlighter-rouge">hoodie.bloom.index.filter.type</code> <br />
+<span style="color:grey">Filter type used. Default is 
BloomFilterTypeCode.SIMPLE. Available values are [BloomFilterTypeCode.SIMPLE , 
BloomFilterTypeCode.DYNAMIC_V0]. Dynamic bloom filters auto size themselves 
based on number of keys.</span></p>
+
 <h4 id="bloomFilterNumEntries">bloomFilterNumEntries(numEntries = 60000)</h4>
 <p>Property: <code 
class="highlighter-rouge">hoodie.index.bloom.num_entries</code> <br />
-<span style="color:grey">Only applies if index type is BLOOM. <br />This is 
the number of entries to be stored in the bloom filter. We assume the 
maxParquetFileSize is 128MB and averageRecordSize is 1024B and hence we approx 
a total of 130K records in a file. The default (60000) is roughly half of this 
approximation. <a 
href="https://issues.apache.org/jira/browse/HUDI-56";>HUDI-56</a> tracks 
computing this dynamically. Warning: Setting this very low, will generate a lot 
of false positives [...]
+<span style="color:grey">Only applies if index type is BLOOM. <br />This is 
the number of entries to be stored in the bloom filter. We assume the 
maxParquetFileSize is 128MB and averageRecordSize is 1024B and hence we approx 
a total of 130K records in a file. The default (60000) is roughly half of this 
approximation. <a 
href="https://issues.apache.org/jira/browse/HUDI-56";>HUDI-56</a> tracks 
computing this dynamically. Warning: Setting this very low, will generate a lot 
of false positives [...]
 
 <h4 id="bloomFilterFPP">bloomFilterFPP(fpp = 0.000000001)</h4>
 <p>Property: <code class="highlighter-rouge">hoodie.index.bloom.fpp</code> <br 
/>
-<span style="color:grey">Only applies if index type is BLOOM. <br /> Error 
rate allowed given the number of entries. This is used to calculate how many 
bits should be assigned for the bloom filter and the number of hash functions. 
This is usually set very low (default: 0.000000001), we like to tradeoff disk 
space for lower false positives</span></p>
+<span style="color:grey">Only applies if index type is BLOOM. <br /> Error 
rate allowed given the number of entries. This is used to calculate how many 
bits should be assigned for the bloom filter and the number of hash functions. 
This is usually set very low (default: 0.000000001), we like to tradeoff disk 
space for lower false positives. If the number of entries added to bloom filter 
exceeds the congfigured value (<code 
class="highlighter-rouge">hoodie.index.bloom.num_entries</code>),  [...]
 
 <h4 id="bloomIndexParallelism">bloomIndexParallelism(0)</h4>
 <p>Property: <code 
class="highlighter-rouge">hoodie.bloom.index.parallelism</code> <br />
@@ -641,7 +645,7 @@ HoodieWriteConfig can be built using a builder pattern as 
below.</p>
 
 <h4 id="bloomIndexPruneByRanges">bloomIndexPruneByRanges(pruneRanges = 
true)</h4>
 <p>Property: <code 
class="highlighter-rouge">hoodie.bloom.index.prune.by.ranges</code> <br />
-<span style="color:grey">Only applies if index type is BLOOM. <br /> When 
true, range information from files to leveraged speed up index lookups. 
Particularly helpful, if the key has a monotonously increasing prefix, such as 
timestamp.</span></p>
+<span style="color:grey">Only applies if index type is BLOOM. <br /> When 
true, range information from files to leveraged speed up index lookups. 
Particularly helpful, if the key has a monotonously increasing prefix, such as 
timestamp. If the record key is completely random, it is better to turn this 
off.</span></p>
 
 <h4 id="bloomIndexUseCaching">bloomIndexUseCaching(useCaching = true)</h4>
 <p>Property: <code 
class="highlighter-rouge">hoodie.bloom.index.use.caching</code> <br />
@@ -655,10 +659,6 @@ HoodieWriteConfig can be built using a builder pattern as 
below.</p>
 <p>Property: <code 
class="highlighter-rouge">hoodie.bloom.index.bucketized.checking</code> <br />
 <span style="color:grey">Only applies if index type is BLOOM. <br /> When 
true, bucketized bloom filtering is enabled. This reduces skew seen in sort 
based bloom index lookup</span></p>
 
-<h4 id="bloomIndexFilterType">bloomIndexFilterType(bucketizedChecking = 
BloomFilterTypeCode.SIMPLE)</h4>
-<p>Property: <code 
class="highlighter-rouge">hoodie.bloom.index.filter.type</code> <br />
-<span style="color:grey">Filter type used. Default is 
BloomFilterTypeCode.SIMPLE. Available values are [BloomFilterTypeCode.SIMPLE , 
BloomFilterTypeCode.DYNAMIC_V0]. Dynamic bloom filters auto size themselves 
based on number of keys</span></p>
-
 <h4 
id="bloomIndexFilterDynamicMaxEntries">bloomIndexFilterDynamicMaxEntries(maxNumberOfEntries
 = 100000)</h4>
 <p>Property: <code 
class="highlighter-rouge">hoodie.bloom.index.filter.dynamic.max.entries</code> 
<br />
 <span style="color:grey">The threshold for the maximum number of keys to 
record in a dynamic Bloom filter row. Only applies if filter type is 
BloomFilterTypeCode.DYNAMIC_V0.</span></p>
diff --git a/content/docs/docker_demo.html b/content/docs/docker_demo.html
index 516f54f..66f3dfc 100644
--- a/content/docs/docker_demo.html
+++ b/content/docs/docker_demo.html
@@ -573,7 +573,7 @@ inorder to run Hive queries against those tables.</p>
 <div class="language-java highlighter-rouge"><div class="highlight"><pre 
class="highlight"><code><span class="n">docker</span> <span 
class="n">exec</span> <span class="o">-</span><span class="n">it</span> <span 
class="n">adhoc</span><span class="o">-</span><span class="mi">2</span> <span 
class="o">/</span><span class="n">bin</span><span class="o">/</span><span 
class="n">bash</span>
 
 <span class="err">#</span> <span class="nc">THis</span> <span 
class="n">command</span> <span class="n">takes</span> <span class="n">in</span> 
<span class="nc">HIveServer</span> <span class="no">URL</span> <span 
class="n">and</span> <span class="no">COW</span> <span class="nc">Hudi</span> 
<span class="n">table</span> <span class="n">location</span> <span 
class="n">in</span> <span class="no">HDFS</span> <span class="n">and</span> 
<span class="n">sync</span> <span class="n">the</span> <span [...]
-<span class="o">/</span><span class="kt">var</span><span 
class="o">/</span><span class="n">hoodie</span><span class="o">/</span><span 
class="n">ws</span><span class="o">/</span><span class="n">hudi</span><span 
class="o">-</span><span class="n">hive</span><span class="o">-</span><span 
class="n">sync</span><span class="o">/</span><span 
class="n">run_sync_tool</span><span class="o">.</span><span 
class="na">sh</span> <span class="err">\</span>
+<span class="o">/</span><span class="kt">var</span><span 
class="o">/</span><span class="n">hoodie</span><span class="o">/</span><span 
class="n">ws</span><span class="o">/</span><span class="n">hudi</span><span 
class="o">-</span><span class="n">sync</span><span class="o">/</span><span 
class="n">hudi</span><span class="o">-</span><span class="n">hive</span><span 
class="o">-</span><span class="n">sync</span><span class="o">/</span><span 
class="n">run_sync_tool</span><span class="o">.</span> [...]
   <span class="o">--</span><span class="n">jdbc</span><span 
class="o">-</span><span class="n">url</span> <span 
class="nl">jdbc:hive2:</span><span class="c1">//hiveserver:10000 \</span>
   <span class="o">--</span><span class="n">user</span> <span 
class="n">hive</span> <span class="err">\</span>
   <span class="o">--</span><span class="n">pass</span> <span 
class="n">hive</span> <span class="err">\</span>
@@ -586,7 +586,7 @@ inorder to run Hive queries against those tables.</p>
 <span class="o">.....</span>
 
 <span class="err">#</span> <span class="nc">Now</span> <span 
class="n">run</span> <span class="n">hive</span><span class="o">-</span><span 
class="n">sync</span> <span class="k">for</span> <span class="n">the</span> 
<span class="n">second</span> <span class="n">data</span><span 
class="o">-</span><span class="n">set</span> <span class="n">in</span> <span 
class="no">HDFS</span> <span class="n">using</span> <span 
class="nc">Merge</span><span class="o">-</span><span class="nc">On</span><span  
[...]
-<span class="o">/</span><span class="kt">var</span><span 
class="o">/</span><span class="n">hoodie</span><span class="o">/</span><span 
class="n">ws</span><span class="o">/</span><span class="n">hudi</span><span 
class="o">-</span><span class="n">hive</span><span class="o">-</span><span 
class="n">sync</span><span class="o">/</span><span 
class="n">run_sync_tool</span><span class="o">.</span><span 
class="na">sh</span> <span class="err">\</span>
+<span class="o">/</span><span class="kt">var</span><span 
class="o">/</span><span class="n">hoodie</span><span class="o">/</span><span 
class="n">ws</span><span class="o">/</span><span class="n">hudi</span><span 
class="o">-</span><span class="n">sync</span><span class="o">/</span><span 
class="n">hudi</span><span class="o">-</span><span class="n">hive</span><span 
class="o">-</span><span class="n">sync</span><span class="o">/</span><span 
class="n">run_sync_tool</span><span class="o">.</span> [...]
   <span class="o">--</span><span class="n">jdbc</span><span 
class="o">-</span><span class="n">url</span> <span 
class="nl">jdbc:hive2:</span><span class="c1">//hiveserver:10000 \</span>
   <span class="o">--</span><span class="n">user</span> <span 
class="n">hive</span> <span class="err">\</span>
   <span class="o">--</span><span class="n">pass</span> <span 
class="n">hive</span> <span class="err">\</span>
diff --git a/content/docs/writing_data.html b/content/docs/writing_data.html
index 8475a3e..0ab4f2b 100644
--- a/content/docs/writing_data.html
+++ b/content/docs/writing_data.html
@@ -556,13 +556,13 @@ provided under <code 
class="highlighter-rouge">hudi-utilities/src/test/resources
 
 <p><strong><code 
class="highlighter-rouge">DataSourceWriteOptions</code></strong>:</p>
 
-<p><strong>RECORDKEY_FIELD_OPT_KEY</strong> (Required): Primary key field(s). 
Nested fields can be specified using the dot notation eg: <code 
class="highlighter-rouge">a.b.c</code>. When using multiple columns as primary 
key use comma separated notation, eg: <code 
class="highlighter-rouge">"col1,col2,col3,etc"</code>. Single or multiple 
columns as primary key specified by <code 
class="highlighter-rouge">KEYGENERATOR_CLASS_OPT_KEY</code> property.<br />
+<p><strong>RECORDKEY_FIELD_OPT_KEY</strong> (Required): Primary key field(s). 
Record keys uniquely identify a record/row within each partition. If one wants 
to have a global uniqueness, there are two options. You could either make the 
dataset non-partitioned, or, you can leverage Global indexes to ensure record 
keys are unique irrespective of the partition path. Record keys can either be a 
single column or refer to multiple columns. <code 
class="highlighter-rouge">KEYGENERATOR_CLASS_OPT_ [...]
 Default value: <code class="highlighter-rouge">"uuid"</code><br /></p>
 
-<p><strong>PARTITIONPATH_FIELD_OPT_KEY</strong> (Required): Columns to be used 
for partitioning the table. To prevent partitioning, provide empty string as 
value eg: <code class="highlighter-rouge">""</code>. Specify partitioning/no 
partitioning using <code 
class="highlighter-rouge">KEYGENERATOR_CLASS_OPT_KEY</code>. If synchronizing 
to hive, also specify using <code 
class="highlighter-rouge">HIVE_PARTITION_EXTRACTOR_CLASS_OPT_KEY.</code><br />
+<p><strong>PARTITIONPATH_FIELD_OPT_KEY</strong> (Required): Columns to be used 
for partitioning the table. To prevent partitioning, provide empty string as 
value eg: <code class="highlighter-rouge">""</code>. Specify partitioning/no 
partitioning using <code 
class="highlighter-rouge">KEYGENERATOR_CLASS_OPT_KEY</code>. If partition path 
needs to be url encoded, you can set <code 
class="highlighter-rouge">URL_ENCODE_PARTITIONING_OPT_KEY</code>. If 
synchronizing to hive, also specify using < [...]
 Default value: <code class="highlighter-rouge">"partitionpath"</code><br /></p>
 
-<p><strong>PRECOMBINE_FIELD_OPT_KEY</strong> (Required): When two records have 
the same key value, the record with the largest value from the field specified 
will be choosen.<br />
+<p><strong>PRECOMBINE_FIELD_OPT_KEY</strong> (Required): When two records 
within the same batch have the same key value, the record with the largest 
value from the field specified will be choosen. If you are using default 
payload of OverwriteWithLatestAvroPayload for HoodieRecordPayload (<code 
class="highlighter-rouge">WRITE_PAYLOAD_CLASS</code>), an incoming record will 
always takes precendence compared to the one in storage ignoring this <code 
class="highlighter-rouge">PRECOMBINE_FIELD [...]
 Default value: <code class="highlighter-rouge">"ts"</code><br /></p>
 
 <p><strong>OPERATION_OPT_KEY</strong>: The <a href="#write-operations">write 
operations</a> to use.<br />

[hudi] branch asf-site updated: Travis CI build asf-site

Reply via email to