(hudi) branch asf-site updated: chore(site): update FAQ ref links and ks3 fs (#14124)

xushiyan Tue, 21 Oct 2025 19:28:02 -0700

This is an automated email from the ASF dual-hosted git repository.

xushiyan pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/hudi.git



The following commit(s) were added to refs/heads/asf-site by this push:
     new ca1d36b71d90 chore(site): update FAQ ref links and ks3 fs (#14124)
ca1d36b71d90 is described below

commit ca1d36b71d905deb35570c6ce57f85498501ef85
Author: Shiyan Xu <[email protected]>
AuthorDate: Tue Oct 21 21:27:46 2025 -0500

    chore(site): update FAQ ref links and ks3 fs (#14124)
---
 website/docs/cloud.md                                          |  4 +++-
 docs/_docs/3_3_ks3_filesystem.md => website/docs/ks3_hoodie.md |  3 +--
 website/docs/quick-start-guide.md                              |  2 +-
 website/docs/record_merger.md                                  |  2 +-
 website/docs/writing_data.md                                   |  4 ++--
 website/docusaurus.config.js                                   |  4 ++++
 website/sidebars.js                                            |  3 ++-
 website/src/pages/faq/design_and_concepts.md                   |  4 ++--
 website/src/pages/faq/general.md                               |  2 +-
 website/src/pages/faq/storage.md                               |  6 +++---
 website/src/pages/faq/table_services.md                        |  6 +++---
 website/src/pages/faq/writing_tables.md                        | 10 +++++-----
 12 files changed, 28 insertions(+), 22 deletions(-)

diff --git a/website/docs/cloud.md b/website/docs/cloud.md
index 123abd5e6bea..a6e0068bb58f 100644
--- a/website/docs/cloud.md
+++ b/website/docs/cloud.md
@@ -29,10 +29,12 @@ to cloud stores.
    Configurations required for JuiceFS and Hudi co-operability.
 * [Oracle Cloud Infrastructure](oci_hoodie) <br/>
    Configurations required for OCI and Hudi co-operability.
+* [KS3 File System](ks3_hoodie) <br/>
+   Configurations required for KS3 FS and Hudi co-operability.
 
 :::note 
 Many cloud object storage systems like [Amazon 
S3](https://docs.aws.amazon.com/s3/) allow you to set
 lifecycle policies, such as [S3 
Lifecycle](https://docs.aws.amazon.com/AmazonS3/latest/userguide/object-lifecycle-mgmt.html),
 to manage objects. One of the policies is related to object expiration. If 
your organisation has configured such policies, 
 then please ensure to exclude (or have a longer expiry period) for Hudi tables.
-:::
\ No newline at end of file
+:::
diff --git a/docs/_docs/3_3_ks3_filesystem.md b/website/docs/ks3_hoodie.md
similarity index 93%
rename from docs/_docs/3_3_ks3_filesystem.md
rename to website/docs/ks3_hoodie.md
index 1445636858fb..c9d6bc9c59ef 100644
--- a/docs/_docs/3_3_ks3_filesystem.md
+++ b/website/docs/ks3_hoodie.md
@@ -1,7 +1,6 @@
 ---
 title: KS3 Filesystem
-keywords: hudi, hive, aws, s3, spark, presto, ks3
-permalink: /docs/ks3_hoodie.html
+keywords: [hudi, hive, aws, s3, spark, presto, ks3]
 summary: In this page, we go over how to configure Hudi with KS3 filesystem.
 last_modified_at: 2021-08-09T15:59:57-04:00
 ---
diff --git a/website/docs/quick-start-guide.md 
b/website/docs/quick-start-guide.md
index 388482281f6a..9a91fa3e17fa 100644
--- a/website/docs/quick-start-guide.md
+++ b/website/docs/quick-start-guide.md
@@ -1305,7 +1305,7 @@ transformation support, automatic table services and so 
on.
 
 **Structured Streaming** - Hudi supports Spark Structured Streaming reads and 
writes as well. Please see 
[here](writing_tables_streaming_writes#spark-streaming) for more.
 
-Check out more information on [modeling data in 
Hudi](faq/general#how-do-i-model-the-data-stored-in-hudi) and different ways to 
perform [batch writes](/docs/writing_data) and [streaming 
writes](writing_tables_streaming_writes).
+Check out more information on [modeling data in 
Hudi](/faq/general#how-do-i-model-the-data-stored-in-hudi) and different ways 
to perform [batch writes](/docs/writing_data) and [streaming 
writes](writing_tables_streaming_writes).
 
 ### Dockerized Demo
 Even as we showcased the core capabilities, Hudi supports a lot more advanced 
functionality that can make it easy
diff --git a/website/docs/record_merger.md b/website/docs/record_merger.md
index 4c47dbde1b6a..773ee8995f15 100644
--- a/website/docs/record_merger.md
+++ b/website/docs/record_merger.md
@@ -249,7 +249,7 @@ Payload class can be specified using the below configs. For 
more advanced config
 There are also quite a few other implementations. Developers may be interested 
in looking at the hierarchy of `HoodieRecordPayload` interface. For
 example, 
[`MySqlDebeziumAvroPayload`](https://github.com/apache/hudi/blob/e76dd102bcaf8aec5a932e7277ccdbfd73ce1a32/hudi-common/src/main/java/org/apache/hudi/common/model/debezium/MySqlDebeziumAvroPayload.java)
 and 
[`PostgresDebeziumAvroPayload`](https://github.com/apache/hudi/blob/e76dd102bcaf8aec5a932e7277ccdbfd73ce1a32/hudi-common/src/main/java/org/apache/hudi/common/model/debezium/PostgresDebeziumAvroPayload.java)
 provides support for seamlessly applying changes 
 captured via Debezium for MySQL and PostgresDB. 
[`AWSDmsAvroPayload`](https://github.com/apache/hudi/blob/e76dd102bcaf8aec5a932e7277ccdbfd73ce1a32/hudi-common/src/main/java/org/apache/hudi/common/model/AWSDmsAvroPayload.java)
 provides support for applying changes captured via Amazon Database Migration 
Service onto S3.
-For full configurations, go [here](/docs/configurations#RECORD_PAYLOAD) and 
please check out [this 
FAQ](faq/writing_tables/#can-i-implement-my-own-logic-for-how-input-records-are-merged-with-record-on-storage)
 if you want to implement your own custom payloads.
+For full configurations, go [here](/docs/configurations#RECORD_PAYLOAD) and 
please check out [this 
FAQ](/faq/writing_tables/#can-i-implement-my-own-logic-for-how-input-records-are-merged-with-record-on-storage)
 if you want to implement your own custom payloads.
 
 ## Related Resources
 
diff --git a/website/docs/writing_data.md b/website/docs/writing_data.md
index 6d3272378e55..2e193f80359c 100644
--- a/website/docs/writing_data.md
+++ b/website/docs/writing_data.md
@@ -83,7 +83,7 @@ df.write.format("hudi").
 You can check the data generated under 
`/tmp/hudi_trips_cow/<region>/<country>/<city>/`. We provided a record key
 (`uuid` in 
[schema](https://github.com/apache/hudi/blob/6f9b02decb5bb2b83709b1b6ec04a97e4d102c11/hudi-spark-datasource/hudi-spark/src/main/java/org/apache/hudi/QuickstartUtils.java#L60)),
 partition field (`region/country/city`) and combine logic (`ts` in
 
[schema](https://github.com/apache/hudi/blob/6f9b02decb5bb2b83709b1b6ec04a97e4d102c11/hudi-spark-datasource/hudi-spark/src/main/java/org/apache/hudi/QuickstartUtils.java#L60))
 to ensure trip records are unique within each partition. For more info, refer 
to
-[Modeling data stored in 
Hudi](faq/general/#how-do-i-model-the-data-stored-in-hudi)
+[Modeling data stored in 
Hudi](/faq/general/#how-do-i-model-the-data-stored-in-hudi)
 and for info on ways to ingest data into Hudi, refer to [Writing Hudi 
Tables](/docs/hoodie_streaming_ingestion).
 Here we are using the default write operation : `upsert`. If you have a 
workload without updates, you can also issue
 `insert` or `bulk_insert` operations which could be faster. To know more, 
refer to [Write operations](/docs/write_operations)
@@ -119,7 +119,7 @@ df.write.format("hudi").
 You can check the data generated under 
`/tmp/hudi_trips_cow/<region>/<country>/<city>/`. We provided a record key
 (`uuid` in 
[schema](https://github.com/apache/hudi/blob/2e6e302efec2fa848ded4f88a95540ad2adb7798/hudi-spark-datasource/hudi-spark/src/main/java/org/apache/hudi/QuickstartUtils.java#L60)),
 partition field (`region/country/city`) and combine logic (`ts` in
 
[schema](https://github.com/apache/hudi/blob/2e6e302efec2fa848ded4f88a95540ad2adb7798/hudi-spark-datasource/hudi-spark/src/main/java/org/apache/hudi/QuickstartUtils.java#L60))
 to ensure trip records are unique within each partition. For more info, refer 
to
-[Modeling data stored in 
Hudi](faq_general/#how-do-i-model-the-data-stored-in-hudi)
+[Modeling data stored in 
Hudi](/faq/general/#how-do-i-model-the-data-stored-in-hudi)
 and for info on ways to ingest data into Hudi, refer to [Writing Hudi 
Tables](/docs/hoodie_streaming_ingestion).
 Here we are using the default write operation : `upsert`. If you have a 
workload without updates, you can also issue
 `insert` or `bulk_insert` operations which could be faster. To know more, 
refer to [Write operations](/docs/write_operations)
diff --git a/website/docusaurus.config.js b/website/docusaurus.config.js
index ee85581c0d01..be3e4c01ea77 100644
--- a/website/docusaurus.config.js
+++ b/website/docusaurus.config.js
@@ -402,6 +402,10 @@ module.exports = {
               label: "IBM Cloud",
               to: "/docs/ibm_cos_hoodie",
             },
+            {
+              label: "Oracle Cloud",
+              to: "/docs/oci_hoodie",
+            },
           ],
         },
         {
diff --git a/website/sidebars.js b/website/sidebars.js
index 71c209ebdcaa..233e7ff33696 100644
--- a/website/sidebars.js
+++ b/website/sidebars.js
@@ -135,7 +135,8 @@ module.exports = {
                         'ibm_cos_hoodie',
                         'bos_hoodie',
                         'jfs_hoodie',
-                        'oci_hoodie'
+                        'oci_hoodie',
+                        'ks3_hoodie',
                     ],
                 },
             ],
diff --git a/website/src/pages/faq/design_and_concepts.md 
b/website/src/pages/faq/design_and_concepts.md
index 6da73cefb728..faa062c9c95c 100644
--- a/website/src/pages/faq/design_and_concepts.md
+++ b/website/src/pages/faq/design_and_concepts.md
@@ -7,7 +7,7 @@ keywords: [hudi, writing, reading]
 
 ### How does Hudi ensure atomicity?
 
-Hudi writers atomically move an inflight write operation to a "completed" 
state by writing an object/file to the [timeline](timeline) folder, identifying 
the write operation with an instant time that denotes the time the action is 
deemed to have occurred. This is achieved on the underlying DFS (in the case of 
S3/Cloud Storage, by an atomic PUT operation) and can be observed by files of 
the pattern `<instant>.<action>.<state>` in Hudi’s timeline.
+Hudi writers atomically move an inflight write operation to a "completed" 
state by writing an object/file to the [timeline](/docs/timeline) folder, 
identifying the write operation with an instant time that denotes the time the 
action is deemed to have occurred. This is achieved on the underlying DFS (in 
the case of S3/Cloud Storage, by an atomic PUT operation) and can be observed 
by files of the pattern `<instant>.<action>.<state>` in Hudi’s timeline.
 
 ### Does Hudi extend the Hive table layout?
 
@@ -49,7 +49,7 @@ To expand more on the long term approach, Hudi has had a 
proposal to streamline/
 This has been delayed for a few reasons 
 
 - Large hosted query engines and users not upgrading fast enough. 
-- The issues brought up - 
\[[1](faq/design_and_concepts#does-hudis-use-of-wall-clock-timestamp-for-instants-pose-any-clock-skew-issues),[2](faq/design_and_concepts#hudis-commits-are-based-on-transaction-start-time-instead-of-completed-time-does-this-cause-data-loss-or-inconsistency-in-case-of-incremental-and-time-travel-queries)\],
 
+- The issues brought up - 
\[[1](/faq/design_and_concepts#does-hudis-use-of-wall-clock-timestamp-for-instants-pose-any-clock-skew-issues),[2](/faq/design_and_concepts#hudis-commits-are-based-on-transaction-start-time-instead-of-completed-time-does-this-cause-data-loss-or-inconsistency-in-case-of-incremental-and-time-travel-queries)\],
 
 relevant to this are not practically very important to users beyond good 
pedantic discussions, 
 - Wanting to do it alongside [non-blocking concurrency 
control](https://github.com/apache/hudi/pull/7907) in Hudi version 1.x.
 
diff --git a/website/src/pages/faq/general.md b/website/src/pages/faq/general.md
index 80005a23125b..c55addbad68f 100644
--- a/website/src/pages/faq/general.md
+++ b/website/src/pages/faq/general.md
@@ -62,7 +62,7 @@ Nonetheless, Hudi is designed very much like a database and 
provides similar fun
 
 ### How do I model the data stored in Hudi?
 
-When writing data into Hudi, you model the records like how you would on a 
key-value store - specify a key field (unique for a single partition/across 
table), a partition field (denotes partition to place key into) and 
preCombine/combine logic that specifies how to handle duplicates in a batch of 
records written. This model enables Hudi to enforce primary key constraints 
like you would get on a database table. See [here](writing_data) for an example.
+When writing data into Hudi, you model the records like how you would on a 
key-value store - specify a key field (unique for a single partition/across 
table), a partition field (denotes partition to place key into) and 
preCombine/combine logic that specifies how to handle duplicates in a batch of 
records written. This model enables Hudi to enforce primary key constraints 
like you would get on a database table. See [here](/docs/writing_data) for an 
example.
 
 When querying/reading data, Hudi just presents itself as a json-like 
hierarchical table, everyone is used to querying using Hive/Spark/Presto over 
Parquet/Json/Avro.
 
diff --git a/website/src/pages/faq/storage.md b/website/src/pages/faq/storage.md
index 77119450539e..a266405a489e 100644
--- a/website/src/pages/faq/storage.md
+++ b/website/src/pages/faq/storage.md
@@ -19,7 +19,7 @@ More details can be found [here](/docs/concepts/) and also 
[Design And Architect
 
 ### How do I migrate my data to Hudi?
 
-Hudi provides built in support for rewriting your entire table into Hudi 
one-time using the HDFSParquetImporter tool available from the hudi-cli . You 
could also do this via a simple read and write of the dataset using the Spark 
datasource APIs. Once migrated, writes can be performed using normal means 
discussed [here](faq/writing_tables#what-are-some-ways-to-write-a-hudi-table). 
This topic is discussed in detail [here](/docs/migration_guide/), including 
ways to doing partial migrations.
+Hudi provides built in support for rewriting your entire table into Hudi 
one-time using the HDFSParquetImporter tool available from the hudi-cli . You 
could also do this via a simple read and write of the dataset using the Spark 
datasource APIs. Once migrated, writes can be performed using normal means 
discussed [here](/faq/writing_tables#what-are-some-ways-to-write-a-hudi-table). 
This topic is discussed in detail [here](/docs/migration_guide/), including 
ways to doing partial migrations.
 
 ### How to convert an existing COW table to MOR?
 
@@ -170,13 +170,13 @@ After first write:
 
 | _hoodie_commit_time | _hoodie_commit_seqno | _hoodie_record_key | 
_hoodie_partition_path | _hoodie_file_name | Url | ts | uuid |
 | ---| ---| ---| ---| ---| ---| ---| --- |
-| 20220622204044318 | 20220622204044318... | 1 |  | 890aafc0-d897-44d... | 
[hudi.apache.com](http://hudi.apache.com) | 1 | 1 |
+| 20220622204044318 | 20220622204044318... | 1 |  | 890aafc0-d897-44d... | 
hudi.apache.org | 1 | 1 |
 
 After the second write:
 
 | _hoodie_commit_time | _hoodie_commit_seqno | _hoodie_record_key | 
_hoodie_partition_path | _hoodie_file_name | Url | ts | uuid |
 | ---| ---| ---| ---| ---| ---| ---| --- |
-| 20220622204044318 | 20220622204044318... | 1 |  | 890aafc0-d897-44d... | 
[hudi.apache.com](http://hudi.apache.com) | 1 | 1 |
+| 20220622204044318 | 20220622204044318... | 1 |  | 890aafc0-d897-44d... | 
hudi.apache.org | 1 | 1 |
 | 20220622204208997 | 20220622204208997... | 2 |  | 890aafc0-d897-44d... | 
null | 1 | 2 |
 
 ### Can I change keygenerator for an existing table?
diff --git a/website/src/pages/faq/table_services.md 
b/website/src/pages/faq/table_services.md
index 0c6db085f740..60d870fce3b2 100644
--- a/website/src/pages/faq/table_services.md
+++ b/website/src/pages/faq/table_services.md
@@ -36,8 +36,8 @@ Depending on how you write to Hudi these are the possible 
options currently.
   *   Please note it is not possible to disable async compaction for MOR table 
with spark structured streaming.
 *   Flink:
   *   Async compaction is enabled by default for Merge-On-Read table.
-  *   Offline compaction can be achieved by setting `compaction.async.enabled` 
to `false` and periodically running [Flink offline 
Compactor](compaction/#flink-offline-compaction). When running the offline 
compactor, one needs to ensure there are no active writes to the table.
-  *   Third option (highly recommended over the second one) is to schedule the 
compactions from the regular ingestion job and executing the compaction plans 
from an offline job. To achieve this set `compaction.async.enabled` to `false`, 
`compaction.schedule.enabled` to `true` and then run the [Flink offline 
Compactor](compaction/#flink-offline-compaction) periodically to execute the 
plans.
+  *   Offline compaction can be achieved by setting `compaction.async.enabled` 
to `false` and periodically running [Flink offline 
Compactor](/docs/compaction/#flink-offline-compaction). When running the 
offline compactor, one needs to ensure there are no active writes to the table.
+  *   Third option (highly recommended over the second one) is to schedule the 
compactions from the regular ingestion job and executing the compaction plans 
from an offline job. To achieve this set `compaction.async.enabled` to `false`, 
`compaction.schedule.enabled` to `true` and then run the [Flink offline 
Compactor](/docs/compaction/#flink-offline-compaction) periodically to execute 
the plans.
 
 ### How to disable all table services in case of multiple writers?
 
@@ -51,6 +51,6 @@ Hudi runs cleaner to remove old file versions as part of 
writing data either in
 
 Yes. Hudi provides the ability to post a callback notification about a write 
commit. You can use a http hook or choose to
 
-be notified via a Kafka/pulsar topic or plug in your own implementation to get 
notified. Please refer [here](platform_services_post_commit_callback)
+be notified via a Kafka/pulsar topic or plug in your own implementation to get 
notified. Please refer [here](/docs/platform_services_post_commit_callback)
 
 for details
diff --git a/website/src/pages/faq/writing_tables.md 
b/website/src/pages/faq/writing_tables.md
index c2c30abeb807..534ba34eb24f 100644
--- a/website/src/pages/faq/writing_tables.md
+++ b/website/src/pages/faq/writing_tables.md
@@ -7,7 +7,7 @@ keywords: [hudi, writing, reading]
 
 ### What are some ways to write a Hudi table?
 
-Typically, you obtain a set of partial updates/inserts from your source and 
issue [write operations](/docs/write_operations/) against a Hudi table. If you 
ingesting data from any of the standard sources like Kafka, or tailing DFS, the 
[delta streamer](/docs/hoodie_streaming_ingestion#hudi-streamer) tool is 
invaluable and provides an easy, self-managed solution to getting data written 
into Hudi. You can also write your own code to capture data from a custom 
source using the Spark datasour [...]
+Typically, you obtain a set of partial updates/inserts from your source and 
issue [write operations](/docs/write_operations/) against a Hudi table. If you 
ingesting data from any of the standard sources like Kafka, or tailing DFS, the 
[delta streamer](/docs/hoodie_streaming_ingestion#hudi-streamer) tool is 
invaluable and provides an easy, self-managed solution to getting data written 
into Hudi. You can also write your own code to capture data from a custom 
source using the Spark datasour [...]
 
 ### How is a Hudi writer job deployed?
 
@@ -69,15 +69,15 @@ As you could see, ([combineAndGetUpdateValue(), 
getInsertValue()](https://github
 
 ### How do I delete records in the dataset using Hudi?
 
-GDPR has made deletes a must-have tool in everyone's data management toolbox. 
Hudi supports both soft and hard deletes. For details on how to actually 
perform them, see [here](writing_data#deletes).
+GDPR has made deletes a must-have tool in everyone's data management toolbox. 
Hudi supports both soft and hard deletes. For details on how to actually 
perform them, see [here](/docs/writing_data#deletes).
 
 ### Should I need to worry about deleting all copies of the records in case of 
duplicates?
 
-No. Hudi removes all the copies of a record key when deletes are issued. Here 
is the long form explanation - Sometimes accidental user errors can lead to 
duplicates introduced into a Hudi table by either [concurrent 
inserts](faq/writing_tables#can-concurrent-inserts-cause-duplicates) or by [not 
deduping the input 
records](faq/writing_tables#can-single-writer-inserts-have-duplicates) for an 
insert operation. However, using the right index (e.g., in the default [Simple 
Index](https://githu [...]
+No. Hudi removes all the copies of a record key when deletes are issued. Here 
is the long form explanation - Sometimes accidental user errors can lead to 
duplicates introduced into a Hudi table by either [concurrent 
inserts](/faq/writing_tables#can-concurrent-inserts-cause-duplicates) or by 
[not deduping the input 
records](/faq/writing_tables#can-single-writer-inserts-have-duplicates) for an 
insert operation. However, using the right index (e.g., in the default [Simple 
Index](https://git [...]
 
 ### How does Hudi handle duplicate record keys in an input?
 
-When issuing an `upsert` operation on a table and the batch of records 
provided contains multiple entries for a given key, then all of them are 
reduced into a single final value by repeatedly calling payload class's 
[preCombine()](https://github.com/apache/hudi/blob/d3edac4612bde2fa9deca9536801dbc48961fb95/hudi-common/src/main/java/org/apache/hudi/common/model/HoodieRecordPayload.java#L40)
 method . By default, we pick the record with the greatest value (determined by 
calling .compareTo() [...]
+When issuing an `upsert` operation on a table and the batch of records 
provided contains multiple entries for a given key, then all of them are 
reduced into a single final value by repeatedly calling payload class's 
[preCombine()](https://github.com/apache/hudi/blob/d3edac4612bde2fa9deca9536801dbc48961fb95/hudi-common/src/main/java/org/apache/hudi/common/model/HoodieRecordPayload.java#L40)
 method . By default, we pick the record with the greatest value (determined by 
calling .compareTo() [...]
 
 For an insert or bulk_insert operation, no such pre-combining is performed. 
Thus, if your input contains duplicates, the table would also contain 
duplicates. If you don't want duplicate records either issue an **upsert** or 
consider specifying option to de-duplicate input in either datasource using 
[`hoodie.datasource.write.insert.drop.duplicates`](/docs/configurations#hoodiedatasourcewriteinsertdropduplicates)
 & [`hoodie.combine.before.insert`](/docs/configurations/#hoodiecombinebeforei 
[...]
 
@@ -184,7 +184,7 @@ No, Hudi does not expose uncommitted files/blocks to the 
readers. Further, Hudi
 
 ### How are conflicts detected in Hudi between multiple writers?
 
-Hudi employs [optimistic concurrency control](concurrency_control) between 
writers, while implementing MVCC based concurrency control between writers and 
the table services. Concurrent writers to the same table need to be configured 
with the same lock provider configuration, to safely perform writes. By default 
(implemented in 
“[SimpleConcurrentFileWritesConflictResolutionStrategy](https://github.com/apache/hudi/blob/master/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/cli
 [...]
+Hudi employs [optimistic concurrency control](/docs/concurrency_control) 
between writers, while implementing MVCC based concurrency control between 
writers and the table services. Concurrent writers to the same table need to be 
configured with the same lock provider configuration, to safely perform writes. 
By default (implemented in 
“[SimpleConcurrentFileWritesConflictResolutionStrategy](https://github.com/apache/hudi/blob/master/hudi-client/hudi-client-common/src/main/java/org/apache/hu
 [...]
 
 ### Can single-writer inserts have duplicates?

(hudi) branch asf-site updated: chore(site): update FAQ ref links and ks3 fs (#14124)

Reply via email to