This is an automated email from the ASF dual-hosted git repository.
vinoth pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/hudi.git
The following commit(s) were added to refs/heads/asf-site by this push:
new c986463 Travis CI build asf-site
c986463 is described below
commit c986463cc160757db53ab3f2e04404aff382cb96
Author: CI <[email protected]>
AuthorDate: Thu May 13 18:48:34 2021 +0000
Travis CI build asf-site
---
content/docs/0.6.0-configurations.html | 5 +++++
content/docs/0.7.0-configurations.html | 13 +++++++++----
content/docs/0.8.0-configurations.html | 10 ++++++++++
content/docs/configurations.html | 18 ++++++++++++++----
4 files changed, 38 insertions(+), 8 deletions(-)
diff --git a/content/docs/0.6.0-configurations.html
b/content/docs/0.6.0-configurations.html
index 45533f8..ee34c9f 100644
--- a/content/docs/0.6.0-configurations.html
+++ b/content/docs/0.6.0-configurations.html
@@ -464,6 +464,11 @@ This is useful to store checkpointing information, in a
consistent way with the
<p>Property: <code
class="highlighter-rouge">hoodie.datasource.write.insert.drop.duplicates</code>,
Default: <code class="highlighter-rouge">false</code> <br />
<span style="color:grey">If set to true, filters out all duplicate records
from incoming dataframe, during insert operations. </span></p>
+<h4 id="ENABLE_ROW_WRITER_OPT_KEY">ENABLE_ROW_WRITER_OPT_KEY</h4>
+<p>Property: <code
class="highlighter-rouge">hoodie.datasource.write.row.writer.enable</code>,
Default: <code class="highlighter-rouge">false</code> <br />
+<span style="color:grey">When set to true, will perform write operations
directly using the spark native <code class="highlighter-rouge">Row</code>
+representation. This is expected to be faster by 20 to 30% than regular
bulk_insert by setting this config</span></p>
+
<h4 id="HIVE_SYNC_ENABLED_OPT_KEY">HIVE_SYNC_ENABLED_OPT_KEY</h4>
<p>Property: <code
class="highlighter-rouge">hoodie.datasource.hive_sync.enable</code>, Default:
<code class="highlighter-rouge">false</code> <br />
<span style="color:grey">When set to true, register/sync the table to Apache
Hive metastore</span></p>
diff --git a/content/docs/0.7.0-configurations.html
b/content/docs/0.7.0-configurations.html
index 0cac848..1881d86 100644
--- a/content/docs/0.7.0-configurations.html
+++ b/content/docs/0.7.0-configurations.html
@@ -445,6 +445,11 @@ This is useful to store checkpointing information, in a
consistent way with the
<p>Property: <code
class="highlighter-rouge">hoodie.datasource.write.insert.drop.duplicates</code>,
Default: <code class="highlighter-rouge">false</code> <br />
<span style="color:grey">If set to true, filters out all duplicate records
from incoming dataframe, during insert operations. </span></p>
+<h4 id="ENABLE_ROW_WRITER_OPT_KEY">ENABLE_ROW_WRITER_OPT_KEY</h4>
+<p>Property: <code
class="highlighter-rouge">hoodie.datasource.write.row.writer.enable</code>,
Default: <code class="highlighter-rouge">false</code> <br />
+<span style="color:grey">When set to true, will perform write operations
directly using the spark native <code class="highlighter-rouge">Row</code>
+representation. This is expected to be faster by 20 to 30% than regular
bulk_insert by setting this config</span></p>
+
<h4 id="HIVE_SYNC_ENABLED_OPT_KEY">HIVE_SYNC_ENABLED_OPT_KEY</h4>
<p>Property: <code
class="highlighter-rouge">hoodie.datasource.hive_sync.enable</code>, Default:
<code class="highlighter-rouge">false</code> <br />
<span style="color:grey">When set to true, register/sync the table to Apache
Hive metastore</span></p>
@@ -593,10 +598,6 @@ HoodieWriteConfig can be built using a builder pattern as
below.</p>
<p>Property: <code class="highlighter-rouge">hoodie.auto.commit</code><br />
<span style="color:grey">Should HoodieWriteClient autoCommit after insert and
upsert. The client can choose to turn off auto-commit and commit on a “defined
success condition”</span></p>
-<h4
id="withAssumeDatePartitioning">withAssumeDatePartitioning(assumeDatePartitioning
= false)</h4>
-<p>Property: <code
class="highlighter-rouge">hoodie.assume.date.partitioning</code><br />
-<span style="color:grey">Should HoodieWriteClient assume the data is
partitioned by dates, i.e three levels from base path. This is a stop-gap to
support tables created by versions < 0.3.1. Will be removed eventually
</span></p>
-
<h4 id="withConsistencyCheckEnabled">withConsistencyCheckEnabled(enabled =
false)</h4>
<p>Property: <code
class="highlighter-rouge">hoodie.consistency.check.enabled</code><br />
<span style="color:grey">Should HoodieWriteClient perform additional checks to
ensure written files’ are listable on the underlying filesystem/storage. Set
this to true, to workaround S3’s eventual consistency model and ensure all data
written as a part of a commit is faithfully available for queries. </span></p>
@@ -909,6 +910,10 @@ with keys/footers, avoiding full cost of rewriting the
dataset. <code class="hig
<p>Property: <code
class="highlighter-rouge">hoodie.metadata.keep.min.commits</code>, <code
class="highlighter-rouge">hoodie.metadata.keep.max.commits</code> <br />
<span style="color:grey"> Controls the archival of the metadata table’s
timeline </span></p>
+<h4
id="withAssumeDatePartitioning">withAssumeDatePartitioning(assumeDatePartitioning
= false)</h4>
+<p>Property: <code
class="highlighter-rouge">hoodie.assume.date.partitioning</code><br />
+<span style="color:grey">Should HoodieWriteClient assume the data is
partitioned by dates, i.e three levels from base path. This is a stop-gap to
support tables created by versions < 0.3.1. Will be removed eventually
</span></p>
+
<h3 id="clustering-configs">Clustering Configs</h3>
<p>Controls clustering operations in hudi. Each clustering has to be
configured for its strategy, and config params. This config drives the same.</p>
diff --git a/content/docs/0.8.0-configurations.html
b/content/docs/0.8.0-configurations.html
index ed0305d..c763f18 100644
--- a/content/docs/0.8.0-configurations.html
+++ b/content/docs/0.8.0-configurations.html
@@ -453,6 +453,11 @@ This is useful to store checkpointing information, in a
consistent way with the
<p>Property: <code
class="highlighter-rouge">hoodie.datasource.write.insert.drop.duplicates</code>,
Default: <code class="highlighter-rouge">false</code> <br />
<span style="color:grey">If set to true, filters out all duplicate records
from incoming dataframe, during insert operations. </span></p>
+<h4 id="ENABLE_ROW_WRITER_OPT_KEY">ENABLE_ROW_WRITER_OPT_KEY</h4>
+<p>Property: <code
class="highlighter-rouge">hoodie.datasource.write.row.writer.enable</code>,
Default: <code class="highlighter-rouge">false</code> <br />
+<span style="color:grey">When set to true, will perform write operations
directly using the spark native <code class="highlighter-rouge">Row</code>
+representation. This is expected to be faster by 20 to 30% than regular
bulk_insert by setting this config</span></p>
+
<h4 id="HIVE_SYNC_ENABLED_OPT_KEY">HIVE_SYNC_ENABLED_OPT_KEY</h4>
<p>Property: <code
class="highlighter-rouge">hoodie.datasource.hive_sync.enable</code>, Default:
<code class="highlighter-rouge">false</code> <br />
<span style="color:grey">When set to true, register/sync the table to Apache
Hive metastore</span></p>
@@ -820,6 +825,11 @@ HoodieWriteConfig can be built using a builder pattern as
below.</p>
<p>Property: <code
class="highlighter-rouge">hoodie.combine.before.delete</code><br />
<span style="color:grey">Flag which first combines the input RDD and merges
multiple partial records into a single record before deleting in DFS</span></p>
+<h4
id="withMergeAllowDuplicateOnInserts">withMergeAllowDuplicateOnInserts(mergeAllowDuplicateOnInserts
= false)</h4>
+<p>Property: <code
class="highlighter-rouge">hoodie.merge.allow.duplicate.on.inserts</code> <br />
+<span style="color:grey"> When enabled, will route new records as inserts and
will not merge with existing records.
+Result could contain duplicate entries. </span></p>
+
<h4 id="withWriteStatusStorageLevel">withWriteStatusStorageLevel(level =
MEMORY_AND_DISK_SER)</h4>
<p>Property: <code
class="highlighter-rouge">hoodie.write.status.storage.level</code><br />
<span style="color:grey">HoodieWriteClient.insert and HoodieWriteClient.upsert
returns a persisted RDD[WriteStatus], this is because the Client can choose to
inspect the WriteStatus and choose and commit or not based on the failures.
This is a configuration for the storage level for this RDD </span></p>
diff --git a/content/docs/configurations.html b/content/docs/configurations.html
index 4bae5c5..494c857 100644
--- a/content/docs/configurations.html
+++ b/content/docs/configurations.html
@@ -477,6 +477,11 @@ This is useful to store checkpointing information, in a
consistent way with the
<p>Property: <code
class="highlighter-rouge">hoodie.datasource.write.insert.drop.duplicates</code>,
Default: <code class="highlighter-rouge">false</code> <br />
<span style="color:grey">If set to true, filters out all duplicate records
from incoming dataframe, during insert operations. </span></p>
+<h4 id="ENABLE_ROW_WRITER_OPT_KEY">ENABLE_ROW_WRITER_OPT_KEY</h4>
+<p>Property: <code
class="highlighter-rouge">hoodie.datasource.write.row.writer.enable</code>,
Default: <code class="highlighter-rouge">false</code> <br />
+ <span style="color:grey">When set to true, will perform write operations
directly using the spark native <code class="highlighter-rouge">Row</code>
+ representation. This is expected to be faster by 20 to 30% than regular
bulk_insert by setting this config</span></p>
+
<h4 id="HIVE_SYNC_ENABLED_OPT_KEY">HIVE_SYNC_ENABLED_OPT_KEY</h4>
<p>Property: <code
class="highlighter-rouge">hoodie.datasource.hive_sync.enable</code>, Default:
<code class="highlighter-rouge">false</code> <br />
<span style="color:grey">When set to true, register/sync the table to Apache
Hive metastore</span></p>
@@ -1039,6 +1044,11 @@ HoodieWriteConfig can be built using a builder pattern
as below.</p>
<p>Property: <code
class="highlighter-rouge">hoodie.combine.before.delete</code><br />
<span style="color:grey">Flag which first combines the input RDD and merges
multiple partial records into a single record before deleting in DFS</span></p>
+<h4
id="withMergeAllowDuplicateOnInserts">withMergeAllowDuplicateOnInserts(mergeAllowDuplicateOnInserts
= false)</h4>
+<p>Property: <code
class="highlighter-rouge">hoodie.merge.allow.duplicate.on.inserts</code> <br />
+<span style="color:grey"> When enabled, will route new records as inserts and
will not merge with existing records.
+Result could contain duplicate entries. </span></p>
+
<h4 id="withWriteStatusStorageLevel">withWriteStatusStorageLevel(level =
MEMORY_AND_DISK_SER)</h4>
<p>Property: <code
class="highlighter-rouge">hoodie.write.status.storage.level</code><br />
<span style="color:grey">HoodieWriteClient.insert and HoodieWriteClient.upsert
returns a persisted RDD[WriteStatus], this is because the Client can choose to
inspect the WriteStatus and choose and commit or not based on the failures.
This is a configuration for the storage level for this RDD </span></p>
@@ -1047,10 +1057,6 @@ HoodieWriteConfig can be built using a builder pattern
as below.</p>
<p>Property: <code class="highlighter-rouge">hoodie.auto.commit</code><br />
<span style="color:grey">Should HoodieWriteClient autoCommit after insert and
upsert. The client can choose to turn off auto-commit and commit on a “defined
success condition”</span></p>
-<h4
id="withAssumeDatePartitioning">withAssumeDatePartitioning(assumeDatePartitioning
= false)</h4>
-<p>Property: <code
class="highlighter-rouge">hoodie.assume.date.partitioning</code><br />
-<span style="color:grey">Should HoodieWriteClient assume the data is
partitioned by dates, i.e three levels from base path. This is a stop-gap to
support tables created by versions < 0.3.1. Will be removed eventually
</span></p>
-
<h4 id="withConsistencyCheckEnabled">withConsistencyCheckEnabled(enabled =
false)</h4>
<p>Property: <code
class="highlighter-rouge">hoodie.consistency.check.enabled</code><br />
<span style="color:grey">Should HoodieWriteClient perform additional checks to
ensure written files’ are listable on the underlying filesystem/storage. Set
this to true, to workaround S3’s eventual consistency model and ensure all data
written as a part of a commit is faithfully available for queries. </span></p>
@@ -1367,6 +1373,10 @@ with keys/footers, avoiding full cost of rewriting the
dataset. <code class="hig
<p>Property: <code
class="highlighter-rouge">hoodie.metadata.keep.min.commits</code>, <code
class="highlighter-rouge">hoodie.metadata.keep.max.commits</code> <br />
<span style="color:grey"> Controls the archival of the metadata table’s
timeline </span></p>
+<h4
id="withAssumeDatePartitioning">withAssumeDatePartitioning(assumeDatePartitioning
= false)</h4>
+<p>Property: <code
class="highlighter-rouge">hoodie.assume.date.partitioning</code><br />
+<span style="color:grey">Should HoodieWriteClient assume the data is
partitioned by dates, i.e three levels from base path. This is a stop-gap to
support tables created by versions < 0.3.1. Will be removed eventually
</span></p>
+
<h3 id="clustering-configs">Clustering Configs</h3>
<p>Controls clustering operations in hudi. Each clustering has to be
configured for its strategy, and config params. This config drives the same.</p>