(hudi) branch asf-site updated: docs: fix broken links in Hudi website since 0.14.0 (#14192)

xushiyan Wed, 05 Nov 2025 11:28:43 -0800

This is an automated email from the ASF dual-hosted git repository.

xushiyan pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/hudi.git



The following commit(s) were added to refs/heads/asf-site by this push:
     new cdfb881e643f docs: fix broken links in Hudi website since 0.14.0 
(#14192)
cdfb881e643f is described below

commit cdfb881e643f14ba0168efa58640bea10d04c3fe
Author: deepakpanda93 <[email protected]>
AuthorDate: Thu Nov 6 00:58:29 2025 +0530

    docs: fix broken links in Hudi website since 0.14.0 (#14192)
---
 website/docs/comparison.md                         |  2 +-
 website/docs/configurations.md                     |  2 +-
 website/docs/hudi_stack.md                         |  2 +-
 website/docs/metadata.md                           |  2 +-
 website/docs/overview.mdx                          |  2 +-
 website/docs/structure.md                          |  2 +-
 website/docs/syncing_datahub.md                    |  2 +-
 website/docs/troubleshooting.md                    |  2 +-
 website/docs/tuning-guide.md                       |  2 +-
 .../versioned_docs/version-0.14.0/compaction.md    |  2 +-
 .../versioned_docs/version-0.14.0/comparison.md    |  2 +-
 .../version-0.14.0/configurations.md               |  2 +-
 website/versioned_docs/version-0.14.0/faq.md       |  2 +-
 website/versioned_docs/version-0.14.0/metadata.md  |  2 +-
 website/versioned_docs/version-0.14.0/overview.mdx |  2 +-
 website/versioned_docs/version-0.14.0/s3_hoodie.md |  2 +-
 .../version-0.14.0/schema_evolution.md             |  2 +-
 .../versioned_docs/version-0.14.0/sql_queries.md   |  2 +-
 website/versioned_docs/version-0.14.0/structure.md |  2 +-
 .../version-0.14.0/syncing_datahub.md              |  2 +-
 .../version-0.14.0/troubleshooting.md              |  2 +-
 .../versioned_docs/version-0.14.0/tuning-guide.md  |  2 +-
 website/versioned_docs/version-0.14.0/use_cases.md |  2 +-
 .../versioned_docs/version-0.14.1/compaction.md    |  2 +-
 .../versioned_docs/version-0.14.1/comparison.md    |  2 +-
 .../version-0.14.1/configurations.md               |  2 +-
 .../versioned_docs/version-0.14.1/faq_storage.md   |  2 +-
 website/versioned_docs/version-0.14.1/metadata.md  |  2 +-
 website/versioned_docs/version-0.14.1/overview.mdx |  2 +-
 website/versioned_docs/version-0.14.1/s3_hoodie.md |  2 +-
 .../versioned_docs/version-0.14.1/sql_queries.md   |  2 +-
 website/versioned_docs/version-0.14.1/structure.md |  2 +-
 .../version-0.14.1/syncing_datahub.md              |  2 +-
 .../versioned_docs/version-0.14.1/table_types.md   |  2 +-
 .../version-0.14.1/troubleshooting.md              |  2 +-
 .../versioned_docs/version-0.14.1/tuning-guide.md  |  2 +-
 website/versioned_docs/version-0.14.1/use_cases.md |  2 +-
 .../versioned_docs/version-0.15.0/compaction.md    |  2 +-
 .../versioned_docs/version-0.15.0/comparison.md    |  2 +-
 .../version-0.15.0/configurations.md               |  2 +-
 .../versioned_docs/version-0.15.0/faq_storage.md   |  2 +-
 website/versioned_docs/version-0.15.0/metadata.md  |  2 +-
 website/versioned_docs/version-0.15.0/overview.mdx |  2 +-
 .../version-0.15.0/reading_tables_batch_reads.md   |  2 +-
 website/versioned_docs/version-0.15.0/s3_hoodie.md |  2 +-
 .../versioned_docs/version-0.15.0/sql_queries.md   |  2 +-
 website/versioned_docs/version-0.15.0/structure.md |  2 +-
 .../version-0.15.0/syncing_datahub.md              |  2 +-
 .../versioned_docs/version-0.15.0/table_types.md   |  2 +-
 .../version-0.15.0/troubleshooting.md              |  2 +-
 .../versioned_docs/version-0.15.0/tuning-guide.md  |  2 +-
 website/versioned_docs/version-0.15.0/use_cases.md |  2 +-
 website/versioned_docs/version-1.0.0/compaction.md |  2 +-
 website/versioned_docs/version-1.0.0/comparison.md |  2 +-
 .../versioned_docs/version-1.0.0/configurations.md |  2 +-
 .../versioned_docs/version-1.0.0/faq_storage.md    |  2 +-
 website/versioned_docs/version-1.0.0/hudi_stack.md |  2 +-
 website/versioned_docs/version-1.0.0/metadata.md   |  2 +-
 website/versioned_docs/version-1.0.0/overview.mdx  |  2 +-
 .../version-1.0.0/reading_tables_batch_reads.md    |  2 +-
 website/versioned_docs/version-1.0.0/s3_hoodie.md  |  2 +-
 .../versioned_docs/version-1.0.0/sql_queries.md    |  2 +-
 website/versioned_docs/version-1.0.0/structure.md  |  2 +-
 .../version-1.0.0/syncing_datahub.md               |  2 +-
 .../versioned_docs/version-1.0.0/table_types.md    |  2 +-
 .../version-1.0.0/troubleshooting.md               |  2 +-
 .../versioned_docs/version-1.0.0/tuning-guide.md   |  2 +-
 website/versioned_docs/version-1.0.1/compaction.md |  2 +-
 website/versioned_docs/version-1.0.1/comparison.md |  4 ++--
 .../versioned_docs/version-1.0.1/configurations.md |  2 +-
 .../versioned_docs/version-1.0.1/faq_storage.md    |  2 +-
 website/versioned_docs/version-1.0.1/hudi_stack.md |  2 +-
 website/versioned_docs/version-1.0.1/metadata.md   |  2 +-
 website/versioned_docs/version-1.0.1/overview.mdx  |  2 +-
 .../version-1.0.1/reading_tables_batch_reads.md    |  2 +-
 website/versioned_docs/version-1.0.1/s3_hoodie.md  |  2 +-
 .../versioned_docs/version-1.0.1/sql_queries.md    |  2 +-
 website/versioned_docs/version-1.0.1/structure.md  |  2 +-
 .../version-1.0.1/syncing_datahub.md               |  2 +-
 .../version-1.0.1/troubleshooting.md               |  2 +-
 .../versioned_docs/version-1.0.1/tuning-guide.md   |  2 +-
 website/versioned_docs/version-1.0.2/compaction.md |  2 +-
 website/versioned_docs/version-1.0.2/comparison.md |  2 +-
 .../versioned_docs/version-1.0.2/configurations.md |  2 +-
 .../versioned_docs/version-1.0.2/faq_storage.md    |  2 +-
 website/versioned_docs/version-1.0.2/hudi_stack.md | 26 +++++++++++-----------
 website/versioned_docs/version-1.0.2/metadata.md   |  2 +-
 website/versioned_docs/version-1.0.2/overview.mdx  |  2 +-
 website/versioned_docs/version-1.0.2/s3_hoodie.md  |  2 +-
 .../versioned_docs/version-1.0.2/sql_queries.md    |  2 +-
 website/versioned_docs/version-1.0.2/structure.md  |  2 +-
 .../version-1.0.2/syncing_datahub.md               |  2 +-
 .../versioned_docs/version-1.0.2/table_types.md    |  2 +-
 .../version-1.0.2/troubleshooting.md               |  2 +-
 .../versioned_docs/version-1.0.2/tuning-guide.md   |  2 +-
 95 files changed, 108 insertions(+), 108 deletions(-)

diff --git a/website/docs/comparison.md b/website/docs/comparison.md
index 681b359a4de8..0bcce2ace532 100644
--- a/website/docs/comparison.md
+++ b/website/docs/comparison.md
@@ -52,5 +52,5 @@ of PrestoDB/SparkSQL/Hive for your queries.
 
 More advanced use cases revolve around the concepts of [incremental 
processing](https://www.oreilly.com/ideas/ubers-case-for-incremental-processing-on-hadoop),
 which effectively
 uses Hudi even inside the `processing` engine to speed up typical batch 
pipelines. For e.g: Hudi can be used as a state store inside a processing DAG 
(similar
-to how 
[rocksDB](https://ci.apache.org/projects/flink/flink-docs-release-1.2/ops/state_backends#the-rocksdbstatebackend)
 is used by Flink). This is an item on the roadmap
+to how 
[rocksDB](https://ci.apache.org/projects/flink/flink-docs-release-1.2/ops/state_backends.html#the-rocksdbstatebackend)
 is used by Flink). This is an item on the roadmap
 and will eventually happen as a [Beam 
Runner](https://issues.apache.org/jira/browse/HUDI-60)
diff --git a/website/docs/configurations.md b/website/docs/configurations.md
index 2e05446ae5a7..022b2b172f23 100644
--- a/website/docs/configurations.md
+++ b/website/docs/configurations.md
@@ -1851,7 +1851,7 @@ These set of configs are used for Hudi Streamer utility 
which provides the way t
 | [hoodie.streamer.sample.writes.size](#hoodiestreamersamplewritessize)        
                                      | 5000    | Number of records to sample 
from the first write. To improve the estimation's accuracy, for smaller or more 
compressable record size, set the sample size bigger. For bigger or less 
compressable record size, set smaller.<br />`Config Param: 
SAMPLE_WRITES_SIZE`<br />`Since Version: 0.14.0`                                
                                            [...]
 | 
[hoodie.streamer.source.kafka.append.offsets](#hoodiestreamersourcekafkaappendoffsets)
                             | false   | When enabled, appends kafka offset 
info like source offset(_hoodie_kafka_source_offset), partition 
(_hoodie_kafka_source_partition) and timestamp (_hoodie_kafka_source_timestamp) 
to the records. By default its disabled and no kafka offsets are added<br 
/>`Config Param: KAFKA_APPEND_OFFSETS`                                          
                               [...]
 | 
[hoodie.streamer.source.sanitize.invalid.char.mask](#hoodiestreamersourcesanitizeinvalidcharmask)
                  | __      | Defines the character sequence that replaces 
invalid characters in schema field names if 
hoodie.streamer.source.sanitize.invalid.schema.field.names is enabled.<br 
/>`Config Param: SCHEMA_FIELD_NAME_INVALID_CHAR_MASK`                           
                                                                                
                                         [...]
-| 
[hoodie.streamer.source.sanitize.invalid.schema.field.names](#hoodiestreamersourcesanitizeinvalidschemafieldnames)
 | false   | Sanitizes names of invalid schema fields both in the data read 
from source and also in the schema Replaces invalid characters with 
hoodie.streamer.source.sanitize.invalid.char.mask. Invalid characters are by 
goes by avro naming convention 
(https://avro.apache.org/docs/current/spec.html#names).<br />`Config Param: 
SANITIZE_SCHEMA_FIELD_NAMES`                     [...]
+| 
[hoodie.streamer.source.sanitize.invalid.schema.field.names](#hoodiestreamersourcesanitizeinvalidschemafieldnames)
 | false   | Sanitizes names of invalid schema fields both in the data read 
from source and also in the schema Replaces invalid characters with 
hoodie.streamer.source.sanitize.invalid.char.mask. Invalid characters are by 
goes by avro naming convention 
(https://avro.apache.org/docs/++version++/specification/#names).<br />`Config 
Param: SANITIZE_SCHEMA_FIELD_NAMES`            [...]
 ---
 
 
diff --git a/website/docs/hudi_stack.md b/website/docs/hudi_stack.md
index d28231244187..472a1fe374e3 100644
--- a/website/docs/hudi_stack.md
+++ b/website/docs/hudi_stack.md
@@ -57,7 +57,7 @@ File Slices. File groups contain multiple versions of File 
Slices and are split
 the file-group is uniquely identified by the write that created its base file 
or the first log file, which helps order the File Slices.
 
 - **Metadata Table** : Implemented as another merge-on-read Hudi table, the 
[metadata table](./metadata) efficiently handles quick updates with low write 
amplification. 
-It leverages a 
[SSTable](https://cassandra.apache.org/doc/stable/cassandra/architecture/storage_engine.html#sstables)
 based file format for quick, indexed key lookups, 
+It leverages a 
[SSTable](https://cassandra.apache.org/doc/stable/cassandra/architecture/storage-engine.html#sstables)
 based file format for quick, indexed key lookups, 
 storing vital information like file paths, column statistics and schema. This 
approach streamlines operations by reducing the necessity for expensive cloud 
file listings. 
 
 Hudi’s approach of recording updates into Log Files is more efficient and 
involves low merge overhead than systems like Hive ACID, where merging all 
delta records against 
diff --git a/website/docs/metadata.md b/website/docs/metadata.md
index 8f3b403112ac..fe8827ebeec5 100644
--- a/website/docs/metadata.md
+++ b/website/docs/metadata.md
@@ -46,7 +46,7 @@ is tracked using internal tables. This approach provides the 
following advantage
 
 Following are the different types of metadata currently supported.
 
-- ***[files 
listings](https://cwiki.apache.org/confluence/display/HUDI/RFC+-+15%3A+HUDI+File+Listing+Improvements)***:
 
+- ***[files 
listings](https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=147427331)***:
 
   Stored as *files* partition in the metadata table. Contains file information 
such as file name, size, and active state
   for each partition in the data table, along with list of all partitions in 
the table. Improves the files listing performance 
   by avoiding direct storage calls such as *exists, listStatus* and 
*listFiles* on the data table.
diff --git a/website/docs/overview.mdx b/website/docs/overview.mdx
index bb8910f9c7ed..1e55d6916f3a 100644
--- a/website/docs/overview.mdx
+++ b/website/docs/overview.mdx
@@ -25,7 +25,7 @@ but it also allows you to create efficient incremental batch 
pipelines. Apache H
 Hudi’s advanced performance optimizations, make analytical queries/pipelines 
faster with any of the popular query engines including, Apache Spark, Flink, 
Presto, Trino, Hive, etc.
 
 Read the docs for more [use case descriptions](/docs/use_cases) and check out 
[who's using Hudi](/powered-by), to see how some of the
-largest data lakes in the world including 
[Uber](https://eng.uber.com/uber-big-data-platform/), 
[Amazon](https://aws.amazon.com/blogs/big-data/how-amazon-transportation-service-enabled-near-real-time-event-analytics-at-petabyte-scale-using-aws-glue-with-apache-hudi/),
+largest data lakes in the world including 
[Uber](https://www.uber.com/en-IN/blog/uber-big-data-platform/), 
[Amazon](https://aws.amazon.com/blogs/big-data/how-amazon-transportation-service-enabled-near-real-time-event-analytics-at-petabyte-scale-using-aws-glue-with-apache-hudi/),
 
[ByteDance](http://hudi.apache.org/blog/2021/09/01/building-eb-level-data-lake-using-hudi-at-bytedance),
 [Robinhood](https://s.apache.org/hudi-robinhood-talk) and more are 
transforming their production data lakes with Hudi.
 
diff --git a/website/docs/structure.md b/website/docs/structure.md
index 137520dd2a54..0e15e353c30a 100644
--- a/website/docs/structure.md
+++ b/website/docs/structure.md
@@ -9,7 +9,7 @@ Hudi (pronounced “Hoodie”) ingests & manages storage of large 
analytical tab
 
  * **Read Optimized query** - Provides excellent query performance on pure 
columnar storage, much like plain [Parquet](https://parquet.apache.org/) tables.
  * **Incremental query** - Provides a change stream out of the dataset to feed 
downstream jobs/ETLs.
- * **Snapshot query** - Provides queries on real-time data, using a 
combination of columnar & row based storage (e.g Parquet + 
[Avro](http://avro.apache.org/docs/current/mr))
+ * **Snapshot query** - Provides queries on real-time data, using a 
combination of columnar & row based storage (e.g Parquet + 
[Avro](https://avro.apache.org/docs/++version++/mapreduce-guide/))
 
 <figure>
     <img className="docimage" 
src={require("/assets/images/hudi_intro_1.png").default} alt="hudi_intro_1.png" 
/>
diff --git a/website/docs/syncing_datahub.md b/website/docs/syncing_datahub.md
index 52b8d1e3e49e..39a4ea624864 100644
--- a/website/docs/syncing_datahub.md
+++ b/website/docs/syncing_datahub.md
@@ -3,7 +3,7 @@ title: DataHub
 keywords: [hudi, datahub, sync]
 ---
 
-[DataHub](https://datahubproject.io/) is a rich metadata platform that 
supports features like data discovery, data
+[DataHub](https://datahub.com/) is a rich metadata platform that supports 
features like data discovery, data
 obeservability, federated governance, etc.
 
 Since Hudi 0.11.0, you can now sync to a DataHub instance by setting 
`DataHubSyncTool` as one of the sync tool classes
diff --git a/website/docs/troubleshooting.md b/website/docs/troubleshooting.md
index 4696694d41d8..47de1002beae 100644
--- a/website/docs/troubleshooting.md
+++ b/website/docs/troubleshooting.md
@@ -40,7 +40,7 @@ You can increase `hoodie.commits.archival.batch` moving 
forward to increase the
 In addition, you can increase the difference between the 2 watermark 
configurations : `hoodie.keep.max.commits` (default : 30) 
 and `hoodie.keep.min.commits` (default : 20). This way, you can reduce the 
number of archive files created and also 
 at the same time increase the number of metadata archived per archive file. 
Note that post 0.7.0 release where we are 
-adding consolidated Hudi metadata 
([RFC-15](https://cwiki.apache.org/confluence/display/HUDI/RFC+-+15%3A+HUDI+File+Listing+Improvements)),
 
+adding consolidated Hudi metadata 
([RFC-15](https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=147427331)),
 
 the follow up work would involve re-organizing archival metadata so that we 
can do periodic compactions to control 
 file-sizing of these archive files.
 
diff --git a/website/docs/tuning-guide.md b/website/docs/tuning-guide.md
index 4a1f72f1b05f..107fa6e67c70 100644
--- a/website/docs/tuning-guide.md
+++ b/website/docs/tuning-guide.md
@@ -57,7 +57,7 @@ When upsert large input data, hudi spills part of input data 
to disk when reach
 
 ### How to tune shuffle parallelism of Hudi jobs ?
 
-First, let's understand what the term parallelism means in the context of Hudi 
jobs. For any Hudi job using Spark, parallelism equals to the number of spark 
partitions that should be generated for a particular stage in the DAG. To 
understand more about spark partitions, read this 
[article](https://www.dezyre.com/article/how-data-partitioning-in-spark-helps-achieve-more-parallelism/297).
 In spark, each spark partition is mapped to a spark task that can be executed 
on an executor. Typicall [...]
+First, let's understand what the term parallelism means in the context of Hudi 
jobs. For any Hudi job using Spark, parallelism equals to the number of spark 
partitions that should be generated for a particular stage in the DAG. To 
understand more about spark partitions, read this 
[article](https://www.projectpro.io/article/how-data-partitioning-in-spark-helps-achieve-more-parallelism/297).
 In spark, each spark partition is mapped to a spark task that can be executed 
on an executor. Typic [...]
 
 (Spark Application → N Spark Jobs → M Spark Stages → T Spark Tasks) on (E 
executors with C cores)
 
diff --git a/website/versioned_docs/version-0.14.0/compaction.md 
b/website/versioned_docs/version-0.14.0/compaction.md
index f7a01c286f03..d9238bce6428 100644
--- a/website/versioned_docs/version-0.14.0/compaction.md
+++ b/website/versioned_docs/version-0.14.0/compaction.md
@@ -13,7 +13,7 @@ not applicable to Copy On Write(COW) tables and only applies 
to MOR tables.
 
 ### Why MOR tables need compaction?
 To understand the significance of compaction in MOR tables, it is helpful to 
understand the MOR table layout first. In Hudi, 
-data is organized in terms of [file 
groups](https://hudi.apache.org/docs/file_layouts/). Each file group in a MOR 
table 
+data is organized in terms of [file groups](file_layouts). Each file group in 
a MOR table 
 consists of a base file and one or more log files. Typically, during writes, 
inserts are stored in the base file, and updates 
 are appended to log files.
 
diff --git a/website/versioned_docs/version-0.14.0/comparison.md 
b/website/versioned_docs/version-0.14.0/comparison.md
index 681b359a4de8..0bcce2ace532 100644
--- a/website/versioned_docs/version-0.14.0/comparison.md
+++ b/website/versioned_docs/version-0.14.0/comparison.md
@@ -52,5 +52,5 @@ of PrestoDB/SparkSQL/Hive for your queries.
 
 More advanced use cases revolve around the concepts of [incremental 
processing](https://www.oreilly.com/ideas/ubers-case-for-incremental-processing-on-hadoop),
 which effectively
 uses Hudi even inside the `processing` engine to speed up typical batch 
pipelines. For e.g: Hudi can be used as a state store inside a processing DAG 
(similar
-to how 
[rocksDB](https://ci.apache.org/projects/flink/flink-docs-release-1.2/ops/state_backends#the-rocksdbstatebackend)
 is used by Flink). This is an item on the roadmap
+to how 
[rocksDB](https://ci.apache.org/projects/flink/flink-docs-release-1.2/ops/state_backends.html#the-rocksdbstatebackend)
 is used by Flink). This is an item on the roadmap
 and will eventually happen as a [Beam 
Runner](https://issues.apache.org/jira/browse/HUDI-60)
diff --git a/website/versioned_docs/version-0.14.0/configurations.md 
b/website/versioned_docs/version-0.14.0/configurations.md
index 3736351b3f34..447e99ecd68d 100644
--- a/website/versioned_docs/version-0.14.0/configurations.md
+++ b/website/versioned_docs/version-0.14.0/configurations.md
@@ -1578,7 +1578,7 @@ These set of configs are used for Hudi Streamer utility 
which provides the way t
 | [hoodie.streamer.sample.writes.size](#hoodiestreamersamplewritessize)        
                                      | 5000    | Number of records to sample 
from the first write. To improve the estimation's accuracy, for smaller or more 
compressable record size, set the sample size bigger. For bigger or less 
compressable record size, set smaller.<br />`Config Param: 
SAMPLE_WRITES_SIZE`<br />`Since Version: 0.14.0`                                
                                            [...]
 | 
[hoodie.streamer.source.kafka.append.offsets](#hoodiestreamersourcekafkaappendoffsets)
                             | false   | When enabled, appends kafka offset 
info like source offset(_hoodie_kafka_source_offset), partition 
(_hoodie_kafka_source_partition) and timestamp (_hoodie_kafka_source_timestamp) 
to the records. By default its disabled and no kafka offsets are added<br 
/>`Config Param: KAFKA_APPEND_OFFSETS`                                          
                               [...]
 | 
[hoodie.streamer.source.sanitize.invalid.char.mask](#hoodiestreamersourcesanitizeinvalidcharmask)
                  | __      | Defines the character sequence that replaces 
invalid characters in schema field names if 
hoodie.streamer.source.sanitize.invalid.schema.field.names is enabled.<br 
/>`Config Param: SCHEMA_FIELD_NAME_INVALID_CHAR_MASK`                           
                                                                                
                                         [...]
-| 
[hoodie.streamer.source.sanitize.invalid.schema.field.names](#hoodiestreamersourcesanitizeinvalidschemafieldnames)
 | false   | Sanitizes names of invalid schema fields both in the data read 
from source and also in the schema Replaces invalid characters with 
hoodie.streamer.source.sanitize.invalid.char.mask. Invalid characters are by 
goes by avro naming convention 
(https://avro.apache.org/docs/current/spec.html#names).<br />`Config Param: 
SANITIZE_SCHEMA_FIELD_NAMES`                     [...]
+| 
[hoodie.streamer.source.sanitize.invalid.schema.field.names](#hoodiestreamersourcesanitizeinvalidschemafieldnames)
 | false   | Sanitizes names of invalid schema fields both in the data read 
from source and also in the schema Replaces invalid characters with 
hoodie.streamer.source.sanitize.invalid.char.mask. Invalid characters are by 
goes by avro naming convention 
(https://avro.apache.org/docs/++version++/specification/#names).<br />`Config 
Param: SANITIZE_SCHEMA_FIELD_NAMES`            [...]
 ---
 
 
diff --git a/website/versioned_docs/version-0.14.0/faq.md 
b/website/versioned_docs/version-0.14.0/faq.md
index 59984161e6a5..d840935e5c6e 100644
--- a/website/versioned_docs/version-0.14.0/faq.md
+++ b/website/versioned_docs/version-0.14.0/faq.md
@@ -474,7 +474,7 @@ The indexing component is a key part of the Hudi writing 
and it maps a given rec
 Hudi supports a few options for indexing as below
 
 *   _HoodieBloomIndex_ : Uses a bloom filter and ranges information placed in 
the footer of parquet/base files (and soon log files as well)
-*   _HoodieGlobalBloomIndex_ : The non global indexing only enforces 
uniqueness of a key inside a single partition i.e the user is expected to know 
the partition under which a given record key is stored. This helps the indexing 
scale very well for even [very large 
datasets](https://eng.uber.com/uber-big-data-platform/). However, in some 
cases, it might be necessary instead to do the de-duping/enforce uniqueness 
across all partitions and the global bloom index does exactly that. If this i 
[...]
+*   _HoodieGlobalBloomIndex_ : The non global indexing only enforces 
uniqueness of a key inside a single partition i.e the user is expected to know 
the partition under which a given record key is stored. This helps the indexing 
scale very well for even [very large 
datasets](https://www.uber.com/en-IN/blog/uber-big-data-platform/). However, in 
some cases, it might be necessary instead to do the de-duping/enforce 
uniqueness across all partitions and the global bloom index does exactly that 
[...]
 *   _HBaseIndex_ : Apache HBase is a key value store, typically found in close 
proximity to HDFS. You can also store the index inside HBase, which could be 
handy if you are already operating HBase.
 *   _HoodieSimpleIndex (default)_ : A simple index which reads interested 
fields (record key and partition path) from base files and joins with incoming 
records to find the tagged location.
 *   _HoodieGlobalSimpleIndex_ : Global version of Simple Index, where in 
uniqueness is on record key across entire table.
diff --git a/website/versioned_docs/version-0.14.0/metadata.md 
b/website/versioned_docs/version-0.14.0/metadata.md
index 48a7047409ca..50b99b907286 100644
--- a/website/versioned_docs/version-0.14.0/metadata.md
+++ b/website/versioned_docs/version-0.14.0/metadata.md
@@ -66,7 +66,7 @@ mechanism and is built on the following core principles:
 
 Following are the different indices currently available under the metadata 
table.
 
-- ***[files 
index](https://cwiki.apache.org/confluence/display/HUDI/RFC+-+15%3A+HUDI+File+Listing+Improvements)***:
 
+- ***[files 
index](https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=147427331)***:
 
   Stored as *files* partition in the metadata table. Contains file information 
such as file name, size, and active state
   for each partition in the data table. Improves the files listing performance 
by avoiding direct file system calls such
   as *exists, listStatus* and *listFiles* on the data table.
diff --git a/website/versioned_docs/version-0.14.0/overview.mdx 
b/website/versioned_docs/version-0.14.0/overview.mdx
index 71a84591ce57..ed4e520d0ce8 100644
--- a/website/versioned_docs/version-0.14.0/overview.mdx
+++ b/website/versioned_docs/version-0.14.0/overview.mdx
@@ -20,7 +20,7 @@ and [concurrency](/docs/concurrency_control) all while 
keeping your data in open
 
 Not only is Apache Hudi great for streaming workloads, but it also allows you 
to create efficient incremental batch pipelines.
 Read the docs for more [use case descriptions](/docs/use_cases) and check out 
[who's using Hudi](/powered-by), to see how some of the
-largest data lakes in the world including 
[Uber](https://eng.uber.com/uber-big-data-platform/), 
[Amazon](https://aws.amazon.com/blogs/big-data/how-amazon-transportation-service-enabled-near-real-time-event-analytics-at-petabyte-scale-using-aws-glue-with-apache-hudi/),
+largest data lakes in the world including 
[Uber](https://www.uber.com/en-IN/blog/uber-big-data-platform/), 
[Amazon](https://aws.amazon.com/blogs/big-data/how-amazon-transportation-service-enabled-near-real-time-event-analytics-at-petabyte-scale-using-aws-glue-with-apache-hudi/),
 
[ByteDance](http://hudi.apache.org/blog/2021/09/01/building-eb-level-data-lake-using-hudi-at-bytedance),
 [Robinhood](https://s.apache.org/hudi-robinhood-talk) and more are 
transforming their production data lakes with Hudi.
 
diff --git a/website/versioned_docs/version-0.14.0/s3_hoodie.md 
b/website/versioned_docs/version-0.14.0/s3_hoodie.md
index b990add7d4b7..5faad6e62be9 100644
--- a/website/versioned_docs/version-0.14.0/s3_hoodie.md
+++ b/website/versioned_docs/version-0.14.0/s3_hoodie.md
@@ -88,7 +88,7 @@ AWS glue data libraries are needed if AWS glue data is used
 
 ## AWS S3 Versioned Bucket
 
-With versioned buckets any object deleted creates a [Delete 
Marker](https://docs.aws.amazon.com/AmazonS3/latest/userguide/DeleteMarker.html),
 as Hudi cleans up files using [Cleaner 
utility](https://hudi.apache.orghoodie_cleaner) the number of Delete Markers 
increases over time.
+With versioned buckets any object deleted creates a [Delete 
Marker](https://docs.aws.amazon.com/AmazonS3/latest/userguide/DeleteMarker.html),
 as Hudi cleans up files using [Cleaner utility](hoodie_cleaner) the number of 
Delete Markers increases over time.
 It is important to configure the [Lifecycle 
Rule](https://docs.aws.amazon.com/AmazonS3/latest/userguide/object-lifecycle-mgmt.html)
 correctly
 to clean up these delete markers as the List operation can choke if the number 
of delete markers reaches 1000.
 We recommend cleaning up Delete Markers after 1 day in Lifecycle Rule.
\ No newline at end of file
diff --git a/website/versioned_docs/version-0.14.0/schema_evolution.md 
b/website/versioned_docs/version-0.14.0/schema_evolution.md
index 49c91ff02902..4f93e85ddcb7 100755
--- a/website/versioned_docs/version-0.14.0/schema_evolution.md
+++ b/website/versioned_docs/version-0.14.0/schema_evolution.md
@@ -29,7 +29,7 @@ type reconciliations. The following table summarizes the 
schema changes compatib
 | Add a new complex type field with default (map and array)                    
    | Yes      | Yes     |                                                      
                                                                                
                                                                                
                                                                         |
 | Add a new nullable column and change the ordering of fields                  
    | No       | No      | Write succeeds but read fails if the write with 
evolved schema updated only some of the base files but not all. Currently, Hudi 
does not maintain a schema registry with history of changes across base files. 
Nevertheless, if the upsert touched all base files then the read will succeed. |
 | Add a custom nullable Hudi meta column, e.g. `_hoodie_meta_col`              
    | Yes      | Yes     |                                                      
                                                                                
                                                                                
                                                                         |
-| Promote datatype from `int` to `long` for a field at root level              
    | Yes      | Yes     | For other types, Hudi supports promotion as 
specified in [Avro schema 
resolution](http://avro.apache.org/docs/current/spec#Schema+Resolution).        
                                                                                
                                                        |
+| Promote datatype from `int` to `long` for a field at root level              
    | Yes      | Yes     | For other types, Hudi supports promotion as 
specified in [Avro schema 
resolution](https://avro.apache.org/docs/++version++/specification/#schema-resolution).
                                                                                
                                                                |
 | Promote datatype from `int` to `long` for a nested field                     
    | Yes      | Yes     |
 | Promote datatype from `int` to `long` for a complex type (value of map or 
array) | Yes      | Yes     |                                                   
                                                                                
                                                                                
                                                                            |
 | Add a new non-nullable column at root level at the end                       
    | No       | No      | In case of MOR table with Spark data source, write 
succeeds but read fails. As a **workaround**, you can make the field nullable.  
                                                                                
                                                                           |
diff --git a/website/versioned_docs/version-0.14.0/sql_queries.md 
b/website/versioned_docs/version-0.14.0/sql_queries.md
index b909287ae05f..7aef29612f4b 100644
--- a/website/versioned_docs/version-0.14.0/sql_queries.md
+++ b/website/versioned_docs/version-0.14.0/sql_queries.md
@@ -329,7 +329,7 @@ for more details.
 
 Copy on Write Tables in Hudi version 0.10.0 can be queried via Doris external 
tables starting from Doris version 1.1.
 Please refer
-to [Doris Hudi 
Catalog](https://doris.apache.org/docs/lakehouse/datalake-analytics/hudi/)
+to [Doris Hudi 
Catalog](https://doris.apache.org/docs/3.x/lakehouse/catalogs/hudi-catalog)
 for more details on the setup.
 
 :::note
diff --git a/website/versioned_docs/version-0.14.0/structure.md 
b/website/versioned_docs/version-0.14.0/structure.md
index 137520dd2a54..0e15e353c30a 100644
--- a/website/versioned_docs/version-0.14.0/structure.md
+++ b/website/versioned_docs/version-0.14.0/structure.md
@@ -9,7 +9,7 @@ Hudi (pronounced “Hoodie”) ingests & manages storage of large 
analytical tab
 
  * **Read Optimized query** - Provides excellent query performance on pure 
columnar storage, much like plain [Parquet](https://parquet.apache.org/) tables.
  * **Incremental query** - Provides a change stream out of the dataset to feed 
downstream jobs/ETLs.
- * **Snapshot query** - Provides queries on real-time data, using a 
combination of columnar & row based storage (e.g Parquet + 
[Avro](http://avro.apache.org/docs/current/mr))
+ * **Snapshot query** - Provides queries on real-time data, using a 
combination of columnar & row based storage (e.g Parquet + 
[Avro](https://avro.apache.org/docs/++version++/mapreduce-guide/))
 
 <figure>
     <img className="docimage" 
src={require("/assets/images/hudi_intro_1.png").default} alt="hudi_intro_1.png" 
/>
diff --git a/website/versioned_docs/version-0.14.0/syncing_datahub.md 
b/website/versioned_docs/version-0.14.0/syncing_datahub.md
index 40fcd1d1891e..952249d3ff68 100644
--- a/website/versioned_docs/version-0.14.0/syncing_datahub.md
+++ b/website/versioned_docs/version-0.14.0/syncing_datahub.md
@@ -3,7 +3,7 @@ title: DataHub
 keywords: [hudi, datahub, sync]
 ---
 
-[DataHub](https://datahubproject.io/) is a rich metadata platform that 
supports features like data discovery, data
+[DataHub](https://datahub.com/) is a rich metadata platform that supports 
features like data discovery, data
 obeservability, federated governance, etc.
 
 Since Hudi 0.11.0, you can now sync to a DataHub instance by setting 
`DataHubSyncTool` as one of the sync tool classes
diff --git a/website/versioned_docs/version-0.14.0/troubleshooting.md 
b/website/versioned_docs/version-0.14.0/troubleshooting.md
index aaa3f4feb635..13d3f3ac98af 100644
--- a/website/versioned_docs/version-0.14.0/troubleshooting.md
+++ b/website/versioned_docs/version-0.14.0/troubleshooting.md
@@ -40,7 +40,7 @@ You can increase `hoodie.commits.archival.batch` moving 
forward to increase the
 In addition, you can increase the difference between the 2 watermark 
configurations : `hoodie.keep.max.commits` (default : 30) 
 and `hoodie.keep.min.commits` (default : 20). This way, you can reduce the 
number of archive files created and also 
 at the same time increase the number of metadata archived per archive file. 
Note that post 0.7.0 release where we are 
-adding consolidated Hudi metadata 
([RFC-15](https://cwiki.apache.org/confluence/display/HUDI/RFC+-+15%3A+HUDI+File+Listing+Improvements)),
 
+adding consolidated Hudi metadata 
([RFC-15](https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=147427331)),
 
 the follow up work would involve re-organizing archival metadata so that we 
can do periodic compactions to control 
 file-sizing of these archive files.
 
diff --git a/website/versioned_docs/version-0.14.0/tuning-guide.md 
b/website/versioned_docs/version-0.14.0/tuning-guide.md
index 4eaddce2dbd3..96a64ed78e95 100644
--- a/website/versioned_docs/version-0.14.0/tuning-guide.md
+++ b/website/versioned_docs/version-0.14.0/tuning-guide.md
@@ -57,7 +57,7 @@ When upsert large input data, hudi spills part of input data 
to disk when reach
 
 ### How to tune shuffle parallelism of Hudi jobs ?
 
-First, let's understand what the term parallelism means in the context of Hudi 
jobs. For any Hudi job using Spark, parallelism equals to the number of spark 
partitions that should be generated for a particular stage in the DAG. To 
understand more about spark partitions, read this 
[article](https://www.dezyre.com/article/how-data-partitioning-in-spark-helps-achieve-more-parallelism/297).
 In spark, each spark partition is mapped to a spark task that can be executed 
on an executor. Typicall [...]
+First, let's understand what the term parallelism means in the context of Hudi 
jobs. For any Hudi job using Spark, parallelism equals to the number of spark 
partitions that should be generated for a particular stage in the DAG. To 
understand more about spark partitions, read this 
[article](https://www.projectpro.io/article/how-data-partitioning-in-spark-helps-achieve-more-parallelism/297).
 In spark, each spark partition is mapped to a spark task that can be executed 
on an executor. Typic [...]
 
 (Spark Application → N Spark Jobs → M Spark Stages → T Spark Tasks) on (E 
executors with C cores)
 
diff --git a/website/versioned_docs/version-0.14.0/use_cases.md 
b/website/versioned_docs/version-0.14.0/use_cases.md
index 893aa653f5e7..e9ccd84ef310 100644
--- a/website/versioned_docs/version-0.14.0/use_cases.md
+++ b/website/versioned_docs/version-0.14.0/use_cases.md
@@ -22,7 +22,7 @@ more value is created.
 For RDBMS ingestion, Hudi provides __faster loads via Upserts__, as opposed 
costly & inefficient bulk loads. It's very common to use a change capture 
solution like
 [Debezium](http://debezium.io/) or [Kafka 
Connect](https://docs.confluent.io/platform/current/connect/index) or 
 [Sqoop Incremental 
Import](https://sqoop.apache.org/docs/1.4.2/SqoopUserGuide#_incremental_imports)
 and apply them to an
-equivalent Hudi table on DFS. For NoSQL datastores like 
[Cassandra](http://cassandra.apache.org/) / 
[Voldemort](http://www.project-voldemort.com/voldemort/) / 
[HBase](https://hbase.apache.org/), 
+equivalent Hudi table on DFS. For NoSQL datastores like 
[Cassandra](http://cassandra.apache.org/) / [HBase](https://hbase.apache.org/), 
 even moderately big installations store billions of rows. It goes without 
saying that __full bulk loads are simply infeasible__ and more efficient 
approaches 
 are needed if ingestion is to keep up with the typically high update volumes.
 
diff --git a/website/versioned_docs/version-0.14.1/compaction.md 
b/website/versioned_docs/version-0.14.1/compaction.md
index 5df14e5af971..3ecd95b43853 100644
--- a/website/versioned_docs/version-0.14.1/compaction.md
+++ b/website/versioned_docs/version-0.14.1/compaction.md
@@ -13,7 +13,7 @@ not applicable to Copy On Write(COW) tables and only applies 
to MOR tables.
 
 ### Why MOR tables need compaction?
 To understand the significance of compaction in MOR tables, it is helpful to 
understand the MOR table layout first. In Hudi, 
-data is organized in terms of [file 
groups](https://hudi.apache.org/docs/file_layouts/). Each file group in a MOR 
table 
+data is organized in terms of [file groups](file_layouts). Each file group in 
a MOR table 
 consists of a base file and one or more log files. Typically, during writes, 
inserts are stored in the base file, and updates 
 are appended to log files.
 
diff --git a/website/versioned_docs/version-0.14.1/comparison.md 
b/website/versioned_docs/version-0.14.1/comparison.md
index 681b359a4de8..0bcce2ace532 100644
--- a/website/versioned_docs/version-0.14.1/comparison.md
+++ b/website/versioned_docs/version-0.14.1/comparison.md
@@ -52,5 +52,5 @@ of PrestoDB/SparkSQL/Hive for your queries.
 
 More advanced use cases revolve around the concepts of [incremental 
processing](https://www.oreilly.com/ideas/ubers-case-for-incremental-processing-on-hadoop),
 which effectively
 uses Hudi even inside the `processing` engine to speed up typical batch 
pipelines. For e.g: Hudi can be used as a state store inside a processing DAG 
(similar
-to how 
[rocksDB](https://ci.apache.org/projects/flink/flink-docs-release-1.2/ops/state_backends#the-rocksdbstatebackend)
 is used by Flink). This is an item on the roadmap
+to how 
[rocksDB](https://ci.apache.org/projects/flink/flink-docs-release-1.2/ops/state_backends.html#the-rocksdbstatebackend)
 is used by Flink). This is an item on the roadmap
 and will eventually happen as a [Beam 
Runner](https://issues.apache.org/jira/browse/HUDI-60)
diff --git a/website/versioned_docs/version-0.14.1/configurations.md 
b/website/versioned_docs/version-0.14.1/configurations.md
index 45d2fd560418..a39c4774aa7b 100644
--- a/website/versioned_docs/version-0.14.1/configurations.md
+++ b/website/versioned_docs/version-0.14.1/configurations.md
@@ -1577,7 +1577,7 @@ These set of configs are used for Hudi Streamer utility 
which provides the way t
 | [hoodie.streamer.sample.writes.size](#hoodiestreamersamplewritessize)        
                                      | 5000    | Number of records to sample 
from the first write. To improve the estimation's accuracy, for smaller or more 
compressable record size, set the sample size bigger. For bigger or less 
compressable record size, set smaller.<br />`Config Param: 
SAMPLE_WRITES_SIZE`<br />`Since Version: 0.14.0`                                
                                            [...]
 | 
[hoodie.streamer.source.kafka.append.offsets](#hoodiestreamersourcekafkaappendoffsets)
                             | false   | When enabled, appends kafka offset 
info like source offset(_hoodie_kafka_source_offset), partition 
(_hoodie_kafka_source_partition) and timestamp (_hoodie_kafka_source_timestamp) 
to the records. By default its disabled and no kafka offsets are added<br 
/>`Config Param: KAFKA_APPEND_OFFSETS`                                          
                               [...]
 | 
[hoodie.streamer.source.sanitize.invalid.char.mask](#hoodiestreamersourcesanitizeinvalidcharmask)
                  | __      | Defines the character sequence that replaces 
invalid characters in schema field names if 
hoodie.streamer.source.sanitize.invalid.schema.field.names is enabled.<br 
/>`Config Param: SCHEMA_FIELD_NAME_INVALID_CHAR_MASK`                           
                                                                                
                                         [...]
-| 
[hoodie.streamer.source.sanitize.invalid.schema.field.names](#hoodiestreamersourcesanitizeinvalidschemafieldnames)
 | false   | Sanitizes names of invalid schema fields both in the data read 
from source and also in the schema Replaces invalid characters with 
hoodie.streamer.source.sanitize.invalid.char.mask. Invalid characters are by 
goes by avro naming convention 
(https://avro.apache.org/docs/current/spec.html#names).<br />`Config Param: 
SANITIZE_SCHEMA_FIELD_NAMES`                     [...]
+| 
[hoodie.streamer.source.sanitize.invalid.schema.field.names](#hoodiestreamersourcesanitizeinvalidschemafieldnames)
 | false   | Sanitizes names of invalid schema fields both in the data read 
from source and also in the schema Replaces invalid characters with 
hoodie.streamer.source.sanitize.invalid.char.mask. Invalid characters are by 
goes by avro naming convention 
(https://avro.apache.org/docs/++version++/specification/#names).<br />`Config 
Param: SANITIZE_SCHEMA_FIELD_NAMES`            [...]
 ---
 
 
diff --git a/website/versioned_docs/version-0.14.1/faq_storage.md 
b/website/versioned_docs/version-0.14.1/faq_storage.md
index 43ca76817a8c..4f7bfd498aeb 100644
--- a/website/versioned_docs/version-0.14.1/faq_storage.md
+++ b/website/versioned_docs/version-0.14.1/faq_storage.md
@@ -47,7 +47,7 @@ The indexing component is a key part of the Hudi writing and 
it maps a given rec
 Hudi supports a few options for indexing as below
 
 *   _HoodieBloomIndex_ : Uses a bloom filter and ranges information placed in 
the footer of parquet/base files (and soon log files as well)
-*   _HoodieGlobalBloomIndex_ : The non global indexing only enforces 
uniqueness of a key inside a single partition i.e the user is expected to know 
the partition under which a given record key is stored. This helps the indexing 
scale very well for even [very large 
datasets](https://eng.uber.com/uber-big-data-platform/). However, in some 
cases, it might be necessary instead to do the de-duping/enforce uniqueness 
across all partitions and the global bloom index does exactly that. If this i 
[...]
+*   _HoodieGlobalBloomIndex_ : The non global indexing only enforces 
uniqueness of a key inside a single partition i.e the user is expected to know 
the partition under which a given record key is stored. This helps the indexing 
scale very well for even [very large 
datasets](https://www.uber.com/en-IN/blog/uber-big-data-platform/). However, in 
some cases, it might be necessary instead to do the de-duping/enforce 
uniqueness across all partitions and the global bloom index does exactly that 
[...]
 *   _HBaseIndex_ : Apache HBase is a key value store, typically found in close 
proximity to HDFS. You can also store the index inside HBase, which could be 
handy if you are already operating HBase.
 *   _HoodieSimpleIndex (default)_ : A simple index which reads interested 
fields (record key and partition path) from base files and joins with incoming 
records to find the tagged location.
 *   _HoodieGlobalSimpleIndex_ : Global version of Simple Index, where in 
uniqueness is on record key across entire table.
diff --git a/website/versioned_docs/version-0.14.1/metadata.md 
b/website/versioned_docs/version-0.14.1/metadata.md
index 52e4c788275f..df520e8a5564 100644
--- a/website/versioned_docs/version-0.14.1/metadata.md
+++ b/website/versioned_docs/version-0.14.1/metadata.md
@@ -66,7 +66,7 @@ mechanism and is built on the following core principles:
 
 Following are the different indices currently available under the metadata 
table.
 
-- ***[files 
index](https://cwiki.apache.org/confluence/display/HUDI/RFC+-+15%3A+HUDI+File+Listing+Improvements)***:
 
+- ***[files 
index](https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=147427331)***:
 
   Stored as *files* partition in the metadata table. Contains file information 
such as file name, size, and active state
   for each partition in the data table. Improves the files listing performance 
by avoiding direct file system calls such
   as *exists, listStatus* and *listFiles* on the data table.
diff --git a/website/versioned_docs/version-0.14.1/overview.mdx 
b/website/versioned_docs/version-0.14.1/overview.mdx
index e6a288328b63..8123b427d464 100644
--- a/website/versioned_docs/version-0.14.1/overview.mdx
+++ b/website/versioned_docs/version-0.14.1/overview.mdx
@@ -20,7 +20,7 @@ and [concurrency](/docs/next/concurrency_control) all while 
keeping your data in
 
 Not only is Apache Hudi great for streaming workloads, but it also allows you 
to create efficient incremental batch pipelines.
 Read the docs for more [use case descriptions](/docs/use_cases) and check out 
[who's using Hudi](/powered-by), to see how some of the
-largest data lakes in the world including 
[Uber](https://eng.uber.com/uber-big-data-platform/), 
[Amazon](https://aws.amazon.com/blogs/big-data/how-amazon-transportation-service-enabled-near-real-time-event-analytics-at-petabyte-scale-using-aws-glue-with-apache-hudi/),
+largest data lakes in the world including 
[Uber](https://www.uber.com/en-IN/blog/uber-big-data-platform/), 
[Amazon](https://aws.amazon.com/blogs/big-data/how-amazon-transportation-service-enabled-near-real-time-event-analytics-at-petabyte-scale-using-aws-glue-with-apache-hudi/),
 
[ByteDance](http://hudi.apache.org/blog/2021/09/01/building-eb-level-data-lake-using-hudi-at-bytedance),
 [Robinhood](https://s.apache.org/hudi-robinhood-talk) and more are 
transforming their production data lakes with Hudi.
 
diff --git a/website/versioned_docs/version-0.14.1/s3_hoodie.md 
b/website/versioned_docs/version-0.14.1/s3_hoodie.md
index b990add7d4b7..5faad6e62be9 100644
--- a/website/versioned_docs/version-0.14.1/s3_hoodie.md
+++ b/website/versioned_docs/version-0.14.1/s3_hoodie.md
@@ -88,7 +88,7 @@ AWS glue data libraries are needed if AWS glue data is used
 
 ## AWS S3 Versioned Bucket
 
-With versioned buckets any object deleted creates a [Delete 
Marker](https://docs.aws.amazon.com/AmazonS3/latest/userguide/DeleteMarker.html),
 as Hudi cleans up files using [Cleaner 
utility](https://hudi.apache.orghoodie_cleaner) the number of Delete Markers 
increases over time.
+With versioned buckets any object deleted creates a [Delete 
Marker](https://docs.aws.amazon.com/AmazonS3/latest/userguide/DeleteMarker.html),
 as Hudi cleans up files using [Cleaner utility](hoodie_cleaner) the number of 
Delete Markers increases over time.
 It is important to configure the [Lifecycle 
Rule](https://docs.aws.amazon.com/AmazonS3/latest/userguide/object-lifecycle-mgmt.html)
 correctly
 to clean up these delete markers as the List operation can choke if the number 
of delete markers reaches 1000.
 We recommend cleaning up Delete Markers after 1 day in Lifecycle Rule.
\ No newline at end of file
diff --git a/website/versioned_docs/version-0.14.1/sql_queries.md 
b/website/versioned_docs/version-0.14.1/sql_queries.md
index 44fbd055289d..a43c2bfcf992 100644
--- a/website/versioned_docs/version-0.14.1/sql_queries.md
+++ b/website/versioned_docs/version-0.14.1/sql_queries.md
@@ -337,7 +337,7 @@ for more details.
 
 Copy on Write Tables in Hudi version 0.10.0 can be queried via Doris external 
tables starting from Doris version 1.1.
 Please refer
-to [Doris Hudi 
Catalog](https://doris.apache.org/docs/lakehouse/datalake-analytics/hudi/)
+to [Doris Hudi 
Catalog](https://doris.apache.org/docs/3.x/lakehouse/catalogs/hudi-catalog)
 for more details on the setup.
 
 :::note
diff --git a/website/versioned_docs/version-0.14.1/structure.md 
b/website/versioned_docs/version-0.14.1/structure.md
index 137520dd2a54..0e15e353c30a 100644
--- a/website/versioned_docs/version-0.14.1/structure.md
+++ b/website/versioned_docs/version-0.14.1/structure.md
@@ -9,7 +9,7 @@ Hudi (pronounced “Hoodie”) ingests & manages storage of large 
analytical tab
 
  * **Read Optimized query** - Provides excellent query performance on pure 
columnar storage, much like plain [Parquet](https://parquet.apache.org/) tables.
  * **Incremental query** - Provides a change stream out of the dataset to feed 
downstream jobs/ETLs.
- * **Snapshot query** - Provides queries on real-time data, using a 
combination of columnar & row based storage (e.g Parquet + 
[Avro](http://avro.apache.org/docs/current/mr))
+ * **Snapshot query** - Provides queries on real-time data, using a 
combination of columnar & row based storage (e.g Parquet + 
[Avro](https://avro.apache.org/docs/++version++/mapreduce-guide/))
 
 <figure>
     <img className="docimage" 
src={require("/assets/images/hudi_intro_1.png").default} alt="hudi_intro_1.png" 
/>
diff --git a/website/versioned_docs/version-0.14.1/syncing_datahub.md 
b/website/versioned_docs/version-0.14.1/syncing_datahub.md
index 40fcd1d1891e..952249d3ff68 100644
--- a/website/versioned_docs/version-0.14.1/syncing_datahub.md
+++ b/website/versioned_docs/version-0.14.1/syncing_datahub.md
@@ -3,7 +3,7 @@ title: DataHub
 keywords: [hudi, datahub, sync]
 ---
 
-[DataHub](https://datahubproject.io/) is a rich metadata platform that 
supports features like data discovery, data
+[DataHub](https://datahub.com/) is a rich metadata platform that supports 
features like data discovery, data
 obeservability, federated governance, etc.
 
 Since Hudi 0.11.0, you can now sync to a DataHub instance by setting 
`DataHubSyncTool` as one of the sync tool classes
diff --git a/website/versioned_docs/version-0.14.1/table_types.md 
b/website/versioned_docs/version-0.14.1/table_types.md
index 28814d239e81..2174aae8f7a8 100644
--- a/website/versioned_docs/version-0.14.1/table_types.md
+++ b/website/versioned_docs/version-0.14.1/table_types.md
@@ -149,4 +149,4 @@ Refer 
[here](https://hudi.apache.org/docs/next/configurations#Flink-Options) for
 
 * [Comparing Apache Hudi's MOR and COW Tables, Use Cases from 
Uber](https://youtu.be/BiTXyzFNHlA)
 * [Different table types in Apache Hudi, MOR and COW, Deep 
Dive](https://youtu.be/vyEvlt57L-s)
-* [How to Query Hudi Tables in Incremental Fashion and Get only New data on 
AWS Glue | Hands on Lab](https://www.youtube.com/watch?v=c6DCJR91rBQx)
\ No newline at end of file
+* [How to Query Hudi Tables in Incremental Fashion and Get only New data on 
AWS Glue | Hands on Lab](https://www.youtube.com/watch?v=c6DCJR91rBQ)
\ No newline at end of file
diff --git a/website/versioned_docs/version-0.14.1/troubleshooting.md 
b/website/versioned_docs/version-0.14.1/troubleshooting.md
index aaa3f4feb635..13d3f3ac98af 100644
--- a/website/versioned_docs/version-0.14.1/troubleshooting.md
+++ b/website/versioned_docs/version-0.14.1/troubleshooting.md
@@ -40,7 +40,7 @@ You can increase `hoodie.commits.archival.batch` moving 
forward to increase the
 In addition, you can increase the difference between the 2 watermark 
configurations : `hoodie.keep.max.commits` (default : 30) 
 and `hoodie.keep.min.commits` (default : 20). This way, you can reduce the 
number of archive files created and also 
 at the same time increase the number of metadata archived per archive file. 
Note that post 0.7.0 release where we are 
-adding consolidated Hudi metadata 
([RFC-15](https://cwiki.apache.org/confluence/display/HUDI/RFC+-+15%3A+HUDI+File+Listing+Improvements)),
 
+adding consolidated Hudi metadata 
([RFC-15](https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=147427331)),
 
 the follow up work would involve re-organizing archival metadata so that we 
can do periodic compactions to control 
 file-sizing of these archive files.
 
diff --git a/website/versioned_docs/version-0.14.1/tuning-guide.md 
b/website/versioned_docs/version-0.14.1/tuning-guide.md
index 4eaddce2dbd3..96a64ed78e95 100644
--- a/website/versioned_docs/version-0.14.1/tuning-guide.md
+++ b/website/versioned_docs/version-0.14.1/tuning-guide.md
@@ -57,7 +57,7 @@ When upsert large input data, hudi spills part of input data 
to disk when reach
 
 ### How to tune shuffle parallelism of Hudi jobs ?
 
-First, let's understand what the term parallelism means in the context of Hudi 
jobs. For any Hudi job using Spark, parallelism equals to the number of spark 
partitions that should be generated for a particular stage in the DAG. To 
understand more about spark partitions, read this 
[article](https://www.dezyre.com/article/how-data-partitioning-in-spark-helps-achieve-more-parallelism/297).
 In spark, each spark partition is mapped to a spark task that can be executed 
on an executor. Typicall [...]
+First, let's understand what the term parallelism means in the context of Hudi 
jobs. For any Hudi job using Spark, parallelism equals to the number of spark 
partitions that should be generated for a particular stage in the DAG. To 
understand more about spark partitions, read this 
[article](https://www.projectpro.io/article/how-data-partitioning-in-spark-helps-achieve-more-parallelism/297).
 In spark, each spark partition is mapped to a spark task that can be executed 
on an executor. Typic [...]
 
 (Spark Application → N Spark Jobs → M Spark Stages → T Spark Tasks) on (E 
executors with C cores)
 
diff --git a/website/versioned_docs/version-0.14.1/use_cases.md 
b/website/versioned_docs/version-0.14.1/use_cases.md
index 4d06f1e571a6..fb6061b8d1b5 100644
--- a/website/versioned_docs/version-0.14.1/use_cases.md
+++ b/website/versioned_docs/version-0.14.1/use_cases.md
@@ -22,7 +22,7 @@ more value is created.
 For RDBMS ingestion, Hudi provides __faster loads via Upserts__, as opposed 
costly & inefficient bulk loads. It's very common to use a change capture 
solution like
 [Debezium](http://debezium.io/) or [Kafka 
Connect](https://docs.confluent.io/platform/current/connect/index) or 
 [Sqoop Incremental 
Import](https://sqoop.apache.org/docs/1.4.2/SqoopUserGuide#_incremental_imports)
 and apply them to an
-equivalent Hudi table on DFS. For NoSQL datastores like 
[Cassandra](http://cassandra.apache.org/) / 
[Voldemort](http://www.project-voldemort.com/voldemort/) / 
[HBase](https://hbase.apache.org/), 
+equivalent Hudi table on DFS. For NoSQL datastores like 
[Cassandra](http://cassandra.apache.org/) / [HBase](https://hbase.apache.org/), 
 even moderately big installations store billions of rows. It goes without 
saying that __full bulk loads are simply infeasible__ and more efficient 
approaches 
 are needed if ingestion is to keep up with the typically high update volumes.
 
diff --git a/website/versioned_docs/version-0.15.0/compaction.md 
b/website/versioned_docs/version-0.15.0/compaction.md
index 54fdfdb54987..1ec5506c3535 100644
--- a/website/versioned_docs/version-0.15.0/compaction.md
+++ b/website/versioned_docs/version-0.15.0/compaction.md
@@ -13,7 +13,7 @@ not applicable to Copy On Write(COW) tables and only applies 
to MOR tables.
 
 ### Why MOR tables need compaction?
 To understand the significance of compaction in MOR tables, it is helpful to 
understand the MOR table layout first. In Hudi, 
-data is organized in terms of [file 
groups](https://hudi.apache.org/docs/file_layouts/). Each file group in a MOR 
table 
+data is organized in terms of [file groups](file_layouts). Each file group in 
a MOR table 
 consists of a base file and one or more log files. Typically, during writes, 
inserts are stored in the base file, and updates 
 are appended to log files.
 
diff --git a/website/versioned_docs/version-0.15.0/comparison.md 
b/website/versioned_docs/version-0.15.0/comparison.md
index 681b359a4de8..0bcce2ace532 100644
--- a/website/versioned_docs/version-0.15.0/comparison.md
+++ b/website/versioned_docs/version-0.15.0/comparison.md
@@ -52,5 +52,5 @@ of PrestoDB/SparkSQL/Hive for your queries.
 
 More advanced use cases revolve around the concepts of [incremental 
processing](https://www.oreilly.com/ideas/ubers-case-for-incremental-processing-on-hadoop),
 which effectively
 uses Hudi even inside the `processing` engine to speed up typical batch 
pipelines. For e.g: Hudi can be used as a state store inside a processing DAG 
(similar
-to how 
[rocksDB](https://ci.apache.org/projects/flink/flink-docs-release-1.2/ops/state_backends#the-rocksdbstatebackend)
 is used by Flink). This is an item on the roadmap
+to how 
[rocksDB](https://ci.apache.org/projects/flink/flink-docs-release-1.2/ops/state_backends.html#the-rocksdbstatebackend)
 is used by Flink). This is an item on the roadmap
 and will eventually happen as a [Beam 
Runner](https://issues.apache.org/jira/browse/HUDI-60)
diff --git a/website/versioned_docs/version-0.15.0/configurations.md 
b/website/versioned_docs/version-0.15.0/configurations.md
index 19cf11a27c23..d90509a4e147 100644
--- a/website/versioned_docs/version-0.15.0/configurations.md
+++ b/website/versioned_docs/version-0.15.0/configurations.md
@@ -1730,7 +1730,7 @@ These set of configs are used for Hudi Streamer utility 
which provides the way t
 | [hoodie.streamer.sample.writes.size](#hoodiestreamersamplewritessize)        
                                      | 5000    | Number of records to sample 
from the first write. To improve the estimation's accuracy, for smaller or more 
compressable record size, set the sample size bigger. For bigger or less 
compressable record size, set smaller.<br />`Config Param: 
SAMPLE_WRITES_SIZE`<br />`Since Version: 0.14.0`                                
                                            [...]
 | 
[hoodie.streamer.source.kafka.append.offsets](#hoodiestreamersourcekafkaappendoffsets)
                             | false   | When enabled, appends kafka offset 
info like source offset(_hoodie_kafka_source_offset), partition 
(_hoodie_kafka_source_partition) and timestamp (_hoodie_kafka_source_timestamp) 
to the records. By default its disabled and no kafka offsets are added<br 
/>`Config Param: KAFKA_APPEND_OFFSETS`                                          
                               [...]
 | 
[hoodie.streamer.source.sanitize.invalid.char.mask](#hoodiestreamersourcesanitizeinvalidcharmask)
                  | __      | Defines the character sequence that replaces 
invalid characters in schema field names if 
hoodie.streamer.source.sanitize.invalid.schema.field.names is enabled.<br 
/>`Config Param: SCHEMA_FIELD_NAME_INVALID_CHAR_MASK`                           
                                                                                
                                         [...]
-| 
[hoodie.streamer.source.sanitize.invalid.schema.field.names](#hoodiestreamersourcesanitizeinvalidschemafieldnames)
 | false   | Sanitizes names of invalid schema fields both in the data read 
from source and also in the schema Replaces invalid characters with 
hoodie.streamer.source.sanitize.invalid.char.mask. Invalid characters are by 
goes by avro naming convention 
(https://avro.apache.org/docs/current/spec.html#names).<br />`Config Param: 
SANITIZE_SCHEMA_FIELD_NAMES`                     [...]
+| 
[hoodie.streamer.source.sanitize.invalid.schema.field.names](#hoodiestreamersourcesanitizeinvalidschemafieldnames)
 | false   | Sanitizes names of invalid schema fields both in the data read 
from source and also in the schema Replaces invalid characters with 
hoodie.streamer.source.sanitize.invalid.char.mask. Invalid characters are by 
goes by avro naming convention 
(https://avro.apache.org/docs/++version++/specification/#names).<br />`Config 
Param: SANITIZE_SCHEMA_FIELD_NAMES`            [...]
 ---
 
 
diff --git a/website/versioned_docs/version-0.15.0/faq_storage.md 
b/website/versioned_docs/version-0.15.0/faq_storage.md
index 359c7764da61..c9456670ecdc 100644
--- a/website/versioned_docs/version-0.15.0/faq_storage.md
+++ b/website/versioned_docs/version-0.15.0/faq_storage.md
@@ -47,7 +47,7 @@ The indexing component is a key part of the Hudi writing and 
it maps a given rec
 Hudi supports a few options for indexing as below
 
 *   _HoodieBloomIndex_ : Uses a bloom filter and ranges information placed in 
the footer of parquet/base files (and soon log files as well)
-*   _HoodieGlobalBloomIndex_ : The non global indexing only enforces 
uniqueness of a key inside a single partition i.e the user is expected to know 
the partition under which a given record key is stored. This helps the indexing 
scale very well for even [very large 
datasets](https://eng.uber.com/uber-big-data-platform/). However, in some 
cases, it might be necessary instead to do the de-duping/enforce uniqueness 
across all partitions and the global bloom index does exactly that. If this i 
[...]
+*   _HoodieGlobalBloomIndex_ : The non global indexing only enforces 
uniqueness of a key inside a single partition i.e the user is expected to know 
the partition under which a given record key is stored. This helps the indexing 
scale very well for even [very large 
datasets](https://www.uber.com/en-IN/blog/uber-big-data-platform/). However, in 
some cases, it might be necessary instead to do the de-duping/enforce 
uniqueness across all partitions and the global bloom index does exactly that 
[...]
 *   _HBaseIndex_ : Apache HBase is a key value store, typically found in close 
proximity to HDFS. You can also store the index inside HBase, which could be 
handy if you are already operating HBase.
 *   _HoodieSimpleIndex (default)_ : A simple index which reads interested 
fields (record key and partition path) from base files and joins with incoming 
records to find the tagged location.
 *   _HoodieGlobalSimpleIndex_ : Global version of Simple Index, where in 
uniqueness is on record key across entire table.
diff --git a/website/versioned_docs/version-0.15.0/metadata.md 
b/website/versioned_docs/version-0.15.0/metadata.md
index b2b57e62f84e..93580bdc04fe 100644
--- a/website/versioned_docs/version-0.15.0/metadata.md
+++ b/website/versioned_docs/version-0.15.0/metadata.md
@@ -66,7 +66,7 @@ mechanism and is built on the following core principles:
 
 Following are the different indices currently available under the metadata 
table.
 
-- ***[files 
index](https://cwiki.apache.org/confluence/display/HUDI/RFC+-+15%3A+HUDI+File+Listing+Improvements)***:
 
+- ***[files 
index](https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=147427331)***:
 
   Stored as *files* partition in the metadata table. Contains file information 
such as file name, size, and active state
   for each partition in the data table. Improves the files listing performance 
by avoiding direct file system calls such
   as *exists, listStatus* and *listFiles* on the data table.
diff --git a/website/versioned_docs/version-0.15.0/overview.mdx 
b/website/versioned_docs/version-0.15.0/overview.mdx
index 27237f6438d3..6083a9d248fa 100644
--- a/website/versioned_docs/version-0.15.0/overview.mdx
+++ b/website/versioned_docs/version-0.15.0/overview.mdx
@@ -20,7 +20,7 @@ and [concurrency](/docs/next/concurrency_control) all while 
keeping your data in
 
 Not only is Apache Hudi great for streaming workloads, but it also allows you 
to create efficient incremental batch pipelines.
 Read the docs for more [use case descriptions](/docs/use_cases) and check out 
[who's using Hudi](/powered-by), to see how some of the
-largest data lakes in the world including 
[Uber](https://eng.uber.com/uber-big-data-platform/), 
[Amazon](https://aws.amazon.com/blogs/big-data/how-amazon-transportation-service-enabled-near-real-time-event-analytics-at-petabyte-scale-using-aws-glue-with-apache-hudi/),
+largest data lakes in the world including 
[Uber](https://www.uber.com/en-IN/blog/uber-big-data-platform/), 
[Amazon](https://aws.amazon.com/blogs/big-data/how-amazon-transportation-service-enabled-near-real-time-event-analytics-at-petabyte-scale-using-aws-glue-with-apache-hudi/),
 
[ByteDance](http://hudi.apache.org/blog/2021/09/01/building-eb-level-data-lake-using-hudi-at-bytedance),
 [Robinhood](https://s.apache.org/hudi-robinhood-talk) and more are 
transforming their production data lakes with Hudi.
 
diff --git 
a/website/versioned_docs/version-0.15.0/reading_tables_batch_reads.md 
b/website/versioned_docs/version-0.15.0/reading_tables_batch_reads.md
index d247fd4c3d08..f3ddcd236694 100644
--- a/website/versioned_docs/version-0.15.0/reading_tables_batch_reads.md
+++ b/website/versioned_docs/version-0.15.0/reading_tables_batch_reads.md
@@ -32,4 +32,4 @@ df = df.where(df["foo"] > 5)
 df.show()
 ```
 
-Check out the Daft docs for [Hudi 
integration](https://www.getdaft.io/projects/docs/en/latest/user_guide/integrations/hudi.html).
+Check out the Daft docs for [Hudi 
integration](https://docs.daft.ai/en/stable/connectors/hudi/).
diff --git a/website/versioned_docs/version-0.15.0/s3_hoodie.md 
b/website/versioned_docs/version-0.15.0/s3_hoodie.md
index b990add7d4b7..5faad6e62be9 100644
--- a/website/versioned_docs/version-0.15.0/s3_hoodie.md
+++ b/website/versioned_docs/version-0.15.0/s3_hoodie.md
@@ -88,7 +88,7 @@ AWS glue data libraries are needed if AWS glue data is used
 
 ## AWS S3 Versioned Bucket
 
-With versioned buckets any object deleted creates a [Delete 
Marker](https://docs.aws.amazon.com/AmazonS3/latest/userguide/DeleteMarker.html),
 as Hudi cleans up files using [Cleaner 
utility](https://hudi.apache.orghoodie_cleaner) the number of Delete Markers 
increases over time.
+With versioned buckets any object deleted creates a [Delete 
Marker](https://docs.aws.amazon.com/AmazonS3/latest/userguide/DeleteMarker.html),
 as Hudi cleans up files using [Cleaner utility](hoodie_cleaner) the number of 
Delete Markers increases over time.
 It is important to configure the [Lifecycle 
Rule](https://docs.aws.amazon.com/AmazonS3/latest/userguide/object-lifecycle-mgmt.html)
 correctly
 to clean up these delete markers as the List operation can choke if the number 
of delete markers reaches 1000.
 We recommend cleaning up Delete Markers after 1 day in Lifecycle Rule.
\ No newline at end of file
diff --git a/website/versioned_docs/version-0.15.0/sql_queries.md 
b/website/versioned_docs/version-0.15.0/sql_queries.md
index 998d90d6b553..7d3f96fbb1a4 100644
--- a/website/versioned_docs/version-0.15.0/sql_queries.md
+++ b/website/versioned_docs/version-0.15.0/sql_queries.md
@@ -336,7 +336,7 @@ for more details.
 ## Doris
 
 The Doris integration currently support Copy on Write and Merge On Read tables 
in Hudi since version 0.10.0. You can query Hudi tables via Doris from Doris 
version 2.0 Doris offers a multi-catalog, which is designed to make it easier 
to connect to external data catalogs to enhance Doris's data lake analysis and 
federated data query capabilities. Please refer
-to [Doris Hudi 
Catalog](https://doris.apache.org/docs/lakehouse/datalake-analytics/hudi/) for 
more details on the setup.
+to [Doris Hudi 
Catalog](https://doris.apache.org/docs/3.x/lakehouse/catalogs/hudi-catalog) for 
more details on the setup.
 
 :::note
 The current default supported version of Hudi is 0.10.0 ~ 0.13.1, and has not 
been tested in other versions. More versions will be supported in the future.
diff --git a/website/versioned_docs/version-0.15.0/structure.md 
b/website/versioned_docs/version-0.15.0/structure.md
index 137520dd2a54..0e15e353c30a 100644
--- a/website/versioned_docs/version-0.15.0/structure.md
+++ b/website/versioned_docs/version-0.15.0/structure.md
@@ -9,7 +9,7 @@ Hudi (pronounced “Hoodie”) ingests & manages storage of large 
analytical tab
 
  * **Read Optimized query** - Provides excellent query performance on pure 
columnar storage, much like plain [Parquet](https://parquet.apache.org/) tables.
  * **Incremental query** - Provides a change stream out of the dataset to feed 
downstream jobs/ETLs.
- * **Snapshot query** - Provides queries on real-time data, using a 
combination of columnar & row based storage (e.g Parquet + 
[Avro](http://avro.apache.org/docs/current/mr))
+ * **Snapshot query** - Provides queries on real-time data, using a 
combination of columnar & row based storage (e.g Parquet + 
[Avro](https://avro.apache.org/docs/++version++/mapreduce-guide/))
 
 <figure>
     <img className="docimage" 
src={require("/assets/images/hudi_intro_1.png").default} alt="hudi_intro_1.png" 
/>
diff --git a/website/versioned_docs/version-0.15.0/syncing_datahub.md 
b/website/versioned_docs/version-0.15.0/syncing_datahub.md
index 40fcd1d1891e..952249d3ff68 100644
--- a/website/versioned_docs/version-0.15.0/syncing_datahub.md
+++ b/website/versioned_docs/version-0.15.0/syncing_datahub.md
@@ -3,7 +3,7 @@ title: DataHub
 keywords: [hudi, datahub, sync]
 ---
 
-[DataHub](https://datahubproject.io/) is a rich metadata platform that 
supports features like data discovery, data
+[DataHub](https://datahub.com/) is a rich metadata platform that supports 
features like data discovery, data
 obeservability, federated governance, etc.
 
 Since Hudi 0.11.0, you can now sync to a DataHub instance by setting 
`DataHubSyncTool` as one of the sync tool classes
diff --git a/website/versioned_docs/version-0.15.0/table_types.md 
b/website/versioned_docs/version-0.15.0/table_types.md
index e280909a9f3b..eb2495894216 100644
--- a/website/versioned_docs/version-0.15.0/table_types.md
+++ b/website/versioned_docs/version-0.15.0/table_types.md
@@ -149,4 +149,4 @@ Refer 
[here](https://hudi.apache.org/docs/next/configurations#Flink-Options) for
 
 * [Comparing Apache Hudi's MOR and COW Tables, Use Cases from 
Uber](https://youtu.be/BiTXyzFNHlA)
 * [Different table types in Apache Hudi, MOR and COW, Deep 
Dive](https://youtu.be/vyEvlt57L-s)
-* [How to Query Hudi Tables in Incremental Fashion and Get only New data on 
AWS Glue | Hands on Lab](https://www.youtube.com/watch?v=c6DCJR91rBQx)
\ No newline at end of file
+* [How to Query Hudi Tables in Incremental Fashion and Get only New data on 
AWS Glue | Hands on Lab](https://www.youtube.com/watch?v=c6DCJR91rBQ)
\ No newline at end of file
diff --git a/website/versioned_docs/version-0.15.0/troubleshooting.md 
b/website/versioned_docs/version-0.15.0/troubleshooting.md
index f16fa458ee7f..6756033995b4 100644
--- a/website/versioned_docs/version-0.15.0/troubleshooting.md
+++ b/website/versioned_docs/version-0.15.0/troubleshooting.md
@@ -40,7 +40,7 @@ You can increase `hoodie.commits.archival.batch` moving 
forward to increase the
 In addition, you can increase the difference between the 2 watermark 
configurations : `hoodie.keep.max.commits` (default : 30) 
 and `hoodie.keep.min.commits` (default : 20). This way, you can reduce the 
number of archive files created and also 
 at the same time increase the number of metadata archived per archive file. 
Note that post 0.7.0 release where we are 
-adding consolidated Hudi metadata 
([RFC-15](https://cwiki.apache.org/confluence/display/HUDI/RFC+-+15%3A+HUDI+File+Listing+Improvements)),
 
+adding consolidated Hudi metadata 
([RFC-15](https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=147427331)),
 
 the follow up work would involve re-organizing archival metadata so that we 
can do periodic compactions to control 
 file-sizing of these archive files.
 
diff --git a/website/versioned_docs/version-0.15.0/tuning-guide.md 
b/website/versioned_docs/version-0.15.0/tuning-guide.md
index 4a1f72f1b05f..107fa6e67c70 100644
--- a/website/versioned_docs/version-0.15.0/tuning-guide.md
+++ b/website/versioned_docs/version-0.15.0/tuning-guide.md
@@ -57,7 +57,7 @@ When upsert large input data, hudi spills part of input data 
to disk when reach
 
 ### How to tune shuffle parallelism of Hudi jobs ?
 
-First, let's understand what the term parallelism means in the context of Hudi 
jobs. For any Hudi job using Spark, parallelism equals to the number of spark 
partitions that should be generated for a particular stage in the DAG. To 
understand more about spark partitions, read this 
[article](https://www.dezyre.com/article/how-data-partitioning-in-spark-helps-achieve-more-parallelism/297).
 In spark, each spark partition is mapped to a spark task that can be executed 
on an executor. Typicall [...]
+First, let's understand what the term parallelism means in the context of Hudi 
jobs. For any Hudi job using Spark, parallelism equals to the number of spark 
partitions that should be generated for a particular stage in the DAG. To 
understand more about spark partitions, read this 
[article](https://www.projectpro.io/article/how-data-partitioning-in-spark-helps-achieve-more-parallelism/297).
 In spark, each spark partition is mapped to a spark task that can be executed 
on an executor. Typic [...]
 
 (Spark Application → N Spark Jobs → M Spark Stages → T Spark Tasks) on (E 
executors with C cores)
 
diff --git a/website/versioned_docs/version-0.15.0/use_cases.md 
b/website/versioned_docs/version-0.15.0/use_cases.md
index 4d06f1e571a6..fb6061b8d1b5 100644
--- a/website/versioned_docs/version-0.15.0/use_cases.md
+++ b/website/versioned_docs/version-0.15.0/use_cases.md
@@ -22,7 +22,7 @@ more value is created.
 For RDBMS ingestion, Hudi provides __faster loads via Upserts__, as opposed 
costly & inefficient bulk loads. It's very common to use a change capture 
solution like
 [Debezium](http://debezium.io/) or [Kafka 
Connect](https://docs.confluent.io/platform/current/connect/index) or 
 [Sqoop Incremental 
Import](https://sqoop.apache.org/docs/1.4.2/SqoopUserGuide#_incremental_imports)
 and apply them to an
-equivalent Hudi table on DFS. For NoSQL datastores like 
[Cassandra](http://cassandra.apache.org/) / 
[Voldemort](http://www.project-voldemort.com/voldemort/) / 
[HBase](https://hbase.apache.org/), 
+equivalent Hudi table on DFS. For NoSQL datastores like 
[Cassandra](http://cassandra.apache.org/) / [HBase](https://hbase.apache.org/), 
 even moderately big installations store billions of rows. It goes without 
saying that __full bulk loads are simply infeasible__ and more efficient 
approaches 
 are needed if ingestion is to keep up with the typically high update volumes.
 
diff --git a/website/versioned_docs/version-1.0.0/compaction.md 
b/website/versioned_docs/version-1.0.0/compaction.md
index 7859030052aa..941c1d227fce 100644
--- a/website/versioned_docs/version-1.0.0/compaction.md
+++ b/website/versioned_docs/version-1.0.0/compaction.md
@@ -13,7 +13,7 @@ not applicable to Copy On Write(COW) tables and only applies 
to MOR tables.
 
 ### Why MOR tables need compaction?
 To understand the significance of compaction in MOR tables, it is helpful to 
understand the MOR table layout first. In Hudi, 
-data is organized in terms of [file 
groups](https://hudi.apache.org/docs/file_layouts/). Each file group in a MOR 
table 
+data is organized in terms of [file groups](storage_layouts). Each file group 
in a MOR table 
 consists of a base file and one or more log files. Typically, during writes, 
inserts are stored in the base file, and updates 
 are appended to log files.
 
diff --git a/website/versioned_docs/version-1.0.0/comparison.md 
b/website/versioned_docs/version-1.0.0/comparison.md
index 681b359a4de8..0bcce2ace532 100644
--- a/website/versioned_docs/version-1.0.0/comparison.md
+++ b/website/versioned_docs/version-1.0.0/comparison.md
@@ -52,5 +52,5 @@ of PrestoDB/SparkSQL/Hive for your queries.
 
 More advanced use cases revolve around the concepts of [incremental 
processing](https://www.oreilly.com/ideas/ubers-case-for-incremental-processing-on-hadoop),
 which effectively
 uses Hudi even inside the `processing` engine to speed up typical batch 
pipelines. For e.g: Hudi can be used as a state store inside a processing DAG 
(similar
-to how 
[rocksDB](https://ci.apache.org/projects/flink/flink-docs-release-1.2/ops/state_backends#the-rocksdbstatebackend)
 is used by Flink). This is an item on the roadmap
+to how 
[rocksDB](https://ci.apache.org/projects/flink/flink-docs-release-1.2/ops/state_backends.html#the-rocksdbstatebackend)
 is used by Flink). This is an item on the roadmap
 and will eventually happen as a [Beam 
Runner](https://issues.apache.org/jira/browse/HUDI-60)
diff --git a/website/versioned_docs/version-1.0.0/configurations.md 
b/website/versioned_docs/version-1.0.0/configurations.md
index 0758147e0df0..3a17558accc9 100644
--- a/website/versioned_docs/version-1.0.0/configurations.md
+++ b/website/versioned_docs/version-1.0.0/configurations.md
@@ -1825,7 +1825,7 @@ These set of configs are used for Hudi Streamer utility 
which provides the way t
 | [hoodie.streamer.sample.writes.size](#hoodiestreamersamplewritessize)        
                                      | 5000    | Number of records to sample 
from the first write. To improve the estimation's accuracy, for smaller or more 
compressable record size, set the sample size bigger. For bigger or less 
compressable record size, set smaller.<br />`Config Param: 
SAMPLE_WRITES_SIZE`<br />`Since Version: 0.14.0`                                
                                            [...]
 | 
[hoodie.streamer.source.kafka.append.offsets](#hoodiestreamersourcekafkaappendoffsets)
                             | false   | When enabled, appends kafka offset 
info like source offset(_hoodie_kafka_source_offset), partition 
(_hoodie_kafka_source_partition) and timestamp (_hoodie_kafka_source_timestamp) 
to the records. By default its disabled and no kafka offsets are added<br 
/>`Config Param: KAFKA_APPEND_OFFSETS`                                          
                               [...]
 | 
[hoodie.streamer.source.sanitize.invalid.char.mask](#hoodiestreamersourcesanitizeinvalidcharmask)
                  | __      | Defines the character sequence that replaces 
invalid characters in schema field names if 
hoodie.streamer.source.sanitize.invalid.schema.field.names is enabled.<br 
/>`Config Param: SCHEMA_FIELD_NAME_INVALID_CHAR_MASK`                           
                                                                                
                                         [...]
-| 
[hoodie.streamer.source.sanitize.invalid.schema.field.names](#hoodiestreamersourcesanitizeinvalidschemafieldnames)
 | false   | Sanitizes names of invalid schema fields both in the data read 
from source and also in the schema Replaces invalid characters with 
hoodie.streamer.source.sanitize.invalid.char.mask. Invalid characters are by 
goes by avro naming convention 
(https://avro.apache.org/docs/current/spec.html#names).<br />`Config Param: 
SANITIZE_SCHEMA_FIELD_NAMES`                     [...]
+| 
[hoodie.streamer.source.sanitize.invalid.schema.field.names](#hoodiestreamersourcesanitizeinvalidschemafieldnames)
 | false   | Sanitizes names of invalid schema fields both in the data read 
from source and also in the schema Replaces invalid characters with 
hoodie.streamer.source.sanitize.invalid.char.mask. Invalid characters are by 
goes by avro naming convention 
(https://avro.apache.org/docs/++version++/specification/#names).<br />`Config 
Param: SANITIZE_SCHEMA_FIELD_NAMES`            [...]
 ---
 
 
diff --git a/website/versioned_docs/version-1.0.0/faq_storage.md 
b/website/versioned_docs/version-1.0.0/faq_storage.md
index fcce76aa46e1..8917fdcb9abb 100644
--- a/website/versioned_docs/version-1.0.0/faq_storage.md
+++ b/website/versioned_docs/version-1.0.0/faq_storage.md
@@ -47,7 +47,7 @@ The indexing component is a key part of the Hudi writing and 
it maps a given rec
 Hudi supports a few options for indexing as below
 
 *   _HoodieBloomIndex_ : Uses a bloom filter and ranges information placed in 
the footer of parquet/base files (and soon log files as well)
-*   _HoodieGlobalBloomIndex_ : The non global indexing only enforces 
uniqueness of a key inside a single partition i.e the user is expected to know 
the partition under which a given record key is stored. This helps the indexing 
scale very well for even [very large 
datasets](https://eng.uber.com/uber-big-data-platform/). However, in some 
cases, it might be necessary instead to do the de-duping/enforce uniqueness 
across all partitions and the global bloom index does exactly that. If this i 
[...]
+*   _HoodieGlobalBloomIndex_ : The non global indexing only enforces 
uniqueness of a key inside a single partition i.e the user is expected to know 
the partition under which a given record key is stored. This helps the indexing 
scale very well for even [very large 
datasets](https://www.uber.com/en-IN/blog/uber-big-data-platform/). However, in 
some cases, it might be necessary instead to do the de-duping/enforce 
uniqueness across all partitions and the global bloom index does exactly that 
[...]
 *   _HBaseIndex_ : Apache HBase is a key value store, typically found in close 
proximity to HDFS. You can also store the index inside HBase, which could be 
handy if you are already operating HBase.
 *   _HoodieSimpleIndex (default)_ : A simple index which reads interested 
fields (record key and partition path) from base files and joins with incoming 
records to find the tagged location.
 *   _HoodieGlobalSimpleIndex_ : Global version of Simple Index, where in 
uniqueness is on record key across entire table.
diff --git a/website/versioned_docs/version-1.0.0/hudi_stack.md 
b/website/versioned_docs/version-1.0.0/hudi_stack.md
index d28231244187..472a1fe374e3 100644
--- a/website/versioned_docs/version-1.0.0/hudi_stack.md
+++ b/website/versioned_docs/version-1.0.0/hudi_stack.md
@@ -57,7 +57,7 @@ File Slices. File groups contain multiple versions of File 
Slices and are split
 the file-group is uniquely identified by the write that created its base file 
or the first log file, which helps order the File Slices.
 
 - **Metadata Table** : Implemented as another merge-on-read Hudi table, the 
[metadata table](./metadata) efficiently handles quick updates with low write 
amplification. 
-It leverages a 
[SSTable](https://cassandra.apache.org/doc/stable/cassandra/architecture/storage_engine.html#sstables)
 based file format for quick, indexed key lookups, 
+It leverages a 
[SSTable](https://cassandra.apache.org/doc/stable/cassandra/architecture/storage-engine.html#sstables)
 based file format for quick, indexed key lookups, 
 storing vital information like file paths, column statistics and schema. This 
approach streamlines operations by reducing the necessity for expensive cloud 
file listings. 
 
 Hudi’s approach of recording updates into Log Files is more efficient and 
involves low merge overhead than systems like Hive ACID, where merging all 
delta records against 
diff --git a/website/versioned_docs/version-1.0.0/metadata.md 
b/website/versioned_docs/version-1.0.0/metadata.md
index 47661f314114..6ad199e7dec6 100644
--- a/website/versioned_docs/version-1.0.0/metadata.md
+++ b/website/versioned_docs/version-1.0.0/metadata.md
@@ -46,7 +46,7 @@ is tracked using internal tables. This approach provides the 
following advantage
 
 Following are the different types of metadata currently supported.
 
-- ***[files 
listings](https://cwiki.apache.org/confluence/display/HUDI/RFC+-+15%3A+HUDI+File+Listing+Improvements)***:
 
+- ***[files 
listings](https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=147427331)***:
 
   Stored as *files* partition in the metadata table. Contains file information 
such as file name, size, and active state
   for each partition in the data table, along with list of all partitions in 
the table. Improves the files listing performance 
   by avoiding direct storage calls such as *exists, listStatus* and 
*listFiles* on the data table.
diff --git a/website/versioned_docs/version-1.0.0/overview.mdx 
b/website/versioned_docs/version-1.0.0/overview.mdx
index bb8910f9c7ed..1e55d6916f3a 100644
--- a/website/versioned_docs/version-1.0.0/overview.mdx
+++ b/website/versioned_docs/version-1.0.0/overview.mdx
@@ -25,7 +25,7 @@ but it also allows you to create efficient incremental batch 
pipelines. Apache H
 Hudi’s advanced performance optimizations, make analytical queries/pipelines 
faster with any of the popular query engines including, Apache Spark, Flink, 
Presto, Trino, Hive, etc.
 
 Read the docs for more [use case descriptions](/docs/use_cases) and check out 
[who's using Hudi](/powered-by), to see how some of the
-largest data lakes in the world including 
[Uber](https://eng.uber.com/uber-big-data-platform/), 
[Amazon](https://aws.amazon.com/blogs/big-data/how-amazon-transportation-service-enabled-near-real-time-event-analytics-at-petabyte-scale-using-aws-glue-with-apache-hudi/),
+largest data lakes in the world including 
[Uber](https://www.uber.com/en-IN/blog/uber-big-data-platform/), 
[Amazon](https://aws.amazon.com/blogs/big-data/how-amazon-transportation-service-enabled-near-real-time-event-analytics-at-petabyte-scale-using-aws-glue-with-apache-hudi/),
 
[ByteDance](http://hudi.apache.org/blog/2021/09/01/building-eb-level-data-lake-using-hudi-at-bytedance),
 [Robinhood](https://s.apache.org/hudi-robinhood-talk) and more are 
transforming their production data lakes with Hudi.
 
diff --git a/website/versioned_docs/version-1.0.0/reading_tables_batch_reads.md 
b/website/versioned_docs/version-1.0.0/reading_tables_batch_reads.md
index d247fd4c3d08..f3ddcd236694 100644
--- a/website/versioned_docs/version-1.0.0/reading_tables_batch_reads.md
+++ b/website/versioned_docs/version-1.0.0/reading_tables_batch_reads.md
@@ -32,4 +32,4 @@ df = df.where(df["foo"] > 5)
 df.show()
 ```
 
-Check out the Daft docs for [Hudi 
integration](https://www.getdaft.io/projects/docs/en/latest/user_guide/integrations/hudi.html).
+Check out the Daft docs for [Hudi 
integration](https://docs.daft.ai/en/stable/connectors/hudi/).
diff --git a/website/versioned_docs/version-1.0.0/s3_hoodie.md 
b/website/versioned_docs/version-1.0.0/s3_hoodie.md
index b990add7d4b7..3161ea4bd284 100644
--- a/website/versioned_docs/version-1.0.0/s3_hoodie.md
+++ b/website/versioned_docs/version-1.0.0/s3_hoodie.md
@@ -88,7 +88,7 @@ AWS glue data libraries are needed if AWS glue data is used
 
 ## AWS S3 Versioned Bucket
 
-With versioned buckets any object deleted creates a [Delete 
Marker](https://docs.aws.amazon.com/AmazonS3/latest/userguide/DeleteMarker.html),
 as Hudi cleans up files using [Cleaner 
utility](https://hudi.apache.orghoodie_cleaner) the number of Delete Markers 
increases over time.
+With versioned buckets any object deleted creates a [Delete 
Marker](https://docs.aws.amazon.com/AmazonS3/latest/userguide/DeleteMarker.html),
 as Hudi cleans up files using [Cleaner utility](cleaning) the number of Delete 
Markers increases over time.
 It is important to configure the [Lifecycle 
Rule](https://docs.aws.amazon.com/AmazonS3/latest/userguide/object-lifecycle-mgmt.html)
 correctly
 to clean up these delete markers as the List operation can choke if the number 
of delete markers reaches 1000.
 We recommend cleaning up Delete Markers after 1 day in Lifecycle Rule.
\ No newline at end of file
diff --git a/website/versioned_docs/version-1.0.0/sql_queries.md 
b/website/versioned_docs/version-1.0.0/sql_queries.md
index 3042af0a0d05..b51b9155ddeb 100644
--- a/website/versioned_docs/version-1.0.0/sql_queries.md
+++ b/website/versioned_docs/version-1.0.0/sql_queries.md
@@ -647,7 +647,7 @@ for more details.
 ## Doris
 
 The Doris integration currently support Copy on Write and Merge On Read tables 
in Hudi since version 0.10.0. You can query Hudi tables via Doris from Doris 
version 2.0 Doris offers a multi-catalog, which is designed to make it easier 
to connect to external data catalogs to enhance Doris's data lake analysis and 
federated data query capabilities. Please refer
-to [Doris Hudi 
Catalog](https://doris.apache.org/docs/lakehouse/datalake-analytics/hudi/) for 
more details on the setup.
+to [Doris Hudi 
Catalog](https://doris.apache.org/docs/3.x/lakehouse/catalogs/hudi-catalog) for 
more details on the setup.
 
 :::note
 The current default supported version of Hudi is 0.10.0 ~ 0.13.1, and has not 
been tested in other versions. More versions will be supported in the future.
diff --git a/website/versioned_docs/version-1.0.0/structure.md 
b/website/versioned_docs/version-1.0.0/structure.md
index 137520dd2a54..0e15e353c30a 100644
--- a/website/versioned_docs/version-1.0.0/structure.md
+++ b/website/versioned_docs/version-1.0.0/structure.md
@@ -9,7 +9,7 @@ Hudi (pronounced “Hoodie”) ingests & manages storage of large 
analytical tab
 
  * **Read Optimized query** - Provides excellent query performance on pure 
columnar storage, much like plain [Parquet](https://parquet.apache.org/) tables.
  * **Incremental query** - Provides a change stream out of the dataset to feed 
downstream jobs/ETLs.
- * **Snapshot query** - Provides queries on real-time data, using a 
combination of columnar & row based storage (e.g Parquet + 
[Avro](http://avro.apache.org/docs/current/mr))
+ * **Snapshot query** - Provides queries on real-time data, using a 
combination of columnar & row based storage (e.g Parquet + 
[Avro](https://avro.apache.org/docs/++version++/mapreduce-guide/))
 
 <figure>
     <img className="docimage" 
src={require("/assets/images/hudi_intro_1.png").default} alt="hudi_intro_1.png" 
/>
diff --git a/website/versioned_docs/version-1.0.0/syncing_datahub.md 
b/website/versioned_docs/version-1.0.0/syncing_datahub.md
index 89cf9bf87996..8cad3da38442 100644
--- a/website/versioned_docs/version-1.0.0/syncing_datahub.md
+++ b/website/versioned_docs/version-1.0.0/syncing_datahub.md
@@ -3,7 +3,7 @@ title: DataHub
 keywords: [hudi, datahub, sync]
 ---
 
-[DataHub](https://datahubproject.io/) is a rich metadata platform that 
supports features like data discovery, data
+[DataHub](https://datahub.com/) is a rich metadata platform that supports 
features like data discovery, data
 obeservability, federated governance, etc.
 
 Since Hudi 0.11.0, you can now sync to a DataHub instance by setting 
`DataHubSyncTool` as one of the sync tool classes
diff --git a/website/versioned_docs/version-1.0.0/table_types.md 
b/website/versioned_docs/version-1.0.0/table_types.md
index 3b7ec911bfc0..c2ae8baab9eb 100644
--- a/website/versioned_docs/version-1.0.0/table_types.md
+++ b/website/versioned_docs/version-1.0.0/table_types.md
@@ -204,4 +204,4 @@ Refer 
[here](https://hudi.apache.org/docs/next/configurations#Flink-Options) for
 
 * [Comparing Apache Hudi's MOR and COW Tables, Use Cases from 
Uber](https://youtu.be/BiTXyzFNHlA)
 * [Different table types in Apache Hudi, MOR and COW, Deep 
Dive](https://youtu.be/vyEvlt57L-s)
-* [How to Query Hudi Tables in Incremental Fashion and Get only New data on 
AWS Glue | Hands on Lab](https://www.youtube.com/watch?v=c6DCJR91rBQx)
\ No newline at end of file
+* [How to Query Hudi Tables in Incremental Fashion and Get only New data on 
AWS Glue | Hands on Lab](https://www.youtube.com/watch?v=c6DCJR91rBQ)
\ No newline at end of file
diff --git a/website/versioned_docs/version-1.0.0/troubleshooting.md 
b/website/versioned_docs/version-1.0.0/troubleshooting.md
index 4696694d41d8..47de1002beae 100644
--- a/website/versioned_docs/version-1.0.0/troubleshooting.md
+++ b/website/versioned_docs/version-1.0.0/troubleshooting.md
@@ -40,7 +40,7 @@ You can increase `hoodie.commits.archival.batch` moving 
forward to increase the
 In addition, you can increase the difference between the 2 watermark 
configurations : `hoodie.keep.max.commits` (default : 30) 
 and `hoodie.keep.min.commits` (default : 20). This way, you can reduce the 
number of archive files created and also 
 at the same time increase the number of metadata archived per archive file. 
Note that post 0.7.0 release where we are 
-adding consolidated Hudi metadata 
([RFC-15](https://cwiki.apache.org/confluence/display/HUDI/RFC+-+15%3A+HUDI+File+Listing+Improvements)),
 
+adding consolidated Hudi metadata 
([RFC-15](https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=147427331)),
 
 the follow up work would involve re-organizing archival metadata so that we 
can do periodic compactions to control 
 file-sizing of these archive files.
 
diff --git a/website/versioned_docs/version-1.0.0/tuning-guide.md 
b/website/versioned_docs/version-1.0.0/tuning-guide.md
index 4a1f72f1b05f..107fa6e67c70 100644
--- a/website/versioned_docs/version-1.0.0/tuning-guide.md
+++ b/website/versioned_docs/version-1.0.0/tuning-guide.md
@@ -57,7 +57,7 @@ When upsert large input data, hudi spills part of input data 
to disk when reach
 
 ### How to tune shuffle parallelism of Hudi jobs ?
 
-First, let's understand what the term parallelism means in the context of Hudi 
jobs. For any Hudi job using Spark, parallelism equals to the number of spark 
partitions that should be generated for a particular stage in the DAG. To 
understand more about spark partitions, read this 
[article](https://www.dezyre.com/article/how-data-partitioning-in-spark-helps-achieve-more-parallelism/297).
 In spark, each spark partition is mapped to a spark task that can be executed 
on an executor. Typicall [...]
+First, let's understand what the term parallelism means in the context of Hudi 
jobs. For any Hudi job using Spark, parallelism equals to the number of spark 
partitions that should be generated for a particular stage in the DAG. To 
understand more about spark partitions, read this 
[article](https://www.projectpro.io/article/how-data-partitioning-in-spark-helps-achieve-more-parallelism/297).
 In spark, each spark partition is mapped to a spark task that can be executed 
on an executor. Typic [...]
 
 (Spark Application → N Spark Jobs → M Spark Stages → T Spark Tasks) on (E 
executors with C cores)
 
diff --git a/website/versioned_docs/version-1.0.1/compaction.md 
b/website/versioned_docs/version-1.0.1/compaction.md
index 6025d89916be..500687c658e5 100644
--- a/website/versioned_docs/version-1.0.1/compaction.md
+++ b/website/versioned_docs/version-1.0.1/compaction.md
@@ -13,7 +13,7 @@ not applicable to Copy On Write(COW) tables and only applies 
to MOR tables.
 
 ### Why MOR tables need compaction?
 To understand the significance of compaction in MOR tables, it is helpful to 
understand the MOR table layout first. In Hudi, 
-data is organized in terms of [file 
groups](https://hudi.apache.org/docs/file_layouts/). Each file group in a MOR 
table 
+data is organized in terms of [file groups](storage_layouts). Each file group 
in a MOR table 
 consists of a base file and one or more log files. Typically, during writes, 
inserts are stored in the base file, and updates 
 are appended to log files.
 
diff --git a/website/versioned_docs/version-1.0.1/comparison.md 
b/website/versioned_docs/version-1.0.1/comparison.md
index 681b359a4de8..7ba799e1453e 100644
--- a/website/versioned_docs/version-1.0.1/comparison.md
+++ b/website/versioned_docs/version-1.0.1/comparison.md
@@ -52,5 +52,5 @@ of PrestoDB/SparkSQL/Hive for your queries.
 
 More advanced use cases revolve around the concepts of [incremental 
processing](https://www.oreilly.com/ideas/ubers-case-for-incremental-processing-on-hadoop),
 which effectively
 uses Hudi even inside the `processing` engine to speed up typical batch 
pipelines. For e.g: Hudi can be used as a state store inside a processing DAG 
(similar
-to how 
[rocksDB](https://ci.apache.org/projects/flink/flink-docs-release-1.2/ops/state_backends#the-rocksdbstatebackend)
 is used by Flink). This is an item on the roadmap
-and will eventually happen as a [Beam 
Runner](https://issues.apache.org/jira/browse/HUDI-60)
+to how 
[rocksDB](https://ci.apache.org/projects/flink/flink-docs-release-1.2/ops/state_backends.html#the-rocksdbstatebackend)
 is used by Flink). This is an item on the roadmap
+and will eventually happen as a [Beam 
Runner](https://issues.apache.org/jira/browse/HUDI-60)
\ No newline at end of file
diff --git a/website/versioned_docs/version-1.0.1/configurations.md 
b/website/versioned_docs/version-1.0.1/configurations.md
index 0758147e0df0..3a17558accc9 100644
--- a/website/versioned_docs/version-1.0.1/configurations.md
+++ b/website/versioned_docs/version-1.0.1/configurations.md
@@ -1825,7 +1825,7 @@ These set of configs are used for Hudi Streamer utility 
which provides the way t
 | [hoodie.streamer.sample.writes.size](#hoodiestreamersamplewritessize)        
                                      | 5000    | Number of records to sample 
from the first write. To improve the estimation's accuracy, for smaller or more 
compressable record size, set the sample size bigger. For bigger or less 
compressable record size, set smaller.<br />`Config Param: 
SAMPLE_WRITES_SIZE`<br />`Since Version: 0.14.0`                                
                                            [...]
 | 
[hoodie.streamer.source.kafka.append.offsets](#hoodiestreamersourcekafkaappendoffsets)
                             | false   | When enabled, appends kafka offset 
info like source offset(_hoodie_kafka_source_offset), partition 
(_hoodie_kafka_source_partition) and timestamp (_hoodie_kafka_source_timestamp) 
to the records. By default its disabled and no kafka offsets are added<br 
/>`Config Param: KAFKA_APPEND_OFFSETS`                                          
                               [...]
 | 
[hoodie.streamer.source.sanitize.invalid.char.mask](#hoodiestreamersourcesanitizeinvalidcharmask)
                  | __      | Defines the character sequence that replaces 
invalid characters in schema field names if 
hoodie.streamer.source.sanitize.invalid.schema.field.names is enabled.<br 
/>`Config Param: SCHEMA_FIELD_NAME_INVALID_CHAR_MASK`                           
                                                                                
                                         [...]
-| 
[hoodie.streamer.source.sanitize.invalid.schema.field.names](#hoodiestreamersourcesanitizeinvalidschemafieldnames)
 | false   | Sanitizes names of invalid schema fields both in the data read 
from source and also in the schema Replaces invalid characters with 
hoodie.streamer.source.sanitize.invalid.char.mask. Invalid characters are by 
goes by avro naming convention 
(https://avro.apache.org/docs/current/spec.html#names).<br />`Config Param: 
SANITIZE_SCHEMA_FIELD_NAMES`                     [...]
+| 
[hoodie.streamer.source.sanitize.invalid.schema.field.names](#hoodiestreamersourcesanitizeinvalidschemafieldnames)
 | false   | Sanitizes names of invalid schema fields both in the data read 
from source and also in the schema Replaces invalid characters with 
hoodie.streamer.source.sanitize.invalid.char.mask. Invalid characters are by 
goes by avro naming convention 
(https://avro.apache.org/docs/++version++/specification/#names).<br />`Config 
Param: SANITIZE_SCHEMA_FIELD_NAMES`            [...]
 ---
 
 
diff --git a/website/versioned_docs/version-1.0.1/faq_storage.md 
b/website/versioned_docs/version-1.0.1/faq_storage.md
index fcce76aa46e1..8917fdcb9abb 100644
--- a/website/versioned_docs/version-1.0.1/faq_storage.md
+++ b/website/versioned_docs/version-1.0.1/faq_storage.md
@@ -47,7 +47,7 @@ The indexing component is a key part of the Hudi writing and 
it maps a given rec
 Hudi supports a few options for indexing as below
 
 *   _HoodieBloomIndex_ : Uses a bloom filter and ranges information placed in 
the footer of parquet/base files (and soon log files as well)
-*   _HoodieGlobalBloomIndex_ : The non global indexing only enforces 
uniqueness of a key inside a single partition i.e the user is expected to know 
the partition under which a given record key is stored. This helps the indexing 
scale very well for even [very large 
datasets](https://eng.uber.com/uber-big-data-platform/). However, in some 
cases, it might be necessary instead to do the de-duping/enforce uniqueness 
across all partitions and the global bloom index does exactly that. If this i 
[...]
+*   _HoodieGlobalBloomIndex_ : The non global indexing only enforces 
uniqueness of a key inside a single partition i.e the user is expected to know 
the partition under which a given record key is stored. This helps the indexing 
scale very well for even [very large 
datasets](https://www.uber.com/en-IN/blog/uber-big-data-platform/). However, in 
some cases, it might be necessary instead to do the de-duping/enforce 
uniqueness across all partitions and the global bloom index does exactly that 
[...]
 *   _HBaseIndex_ : Apache HBase is a key value store, typically found in close 
proximity to HDFS. You can also store the index inside HBase, which could be 
handy if you are already operating HBase.
 *   _HoodieSimpleIndex (default)_ : A simple index which reads interested 
fields (record key and partition path) from base files and joins with incoming 
records to find the tagged location.
 *   _HoodieGlobalSimpleIndex_ : Global version of Simple Index, where in 
uniqueness is on record key across entire table.
diff --git a/website/versioned_docs/version-1.0.1/hudi_stack.md 
b/website/versioned_docs/version-1.0.1/hudi_stack.md
index d3e0fb335353..7989c59fff79 100644
--- a/website/versioned_docs/version-1.0.1/hudi_stack.md
+++ b/website/versioned_docs/version-1.0.1/hudi_stack.md
@@ -57,7 +57,7 @@ File Slices. File groups contain multiple versions of File 
Slices and are split
 the file-group is uniquely identified by the write that created its base file 
or the first log file, which helps order the File Slices.
 
 - **Metadata Table** : Implemented as another merge-on-read Hudi table, the 
[metadata table](./metadata) efficiently handles quick updates with low write 
amplification. 
-It leverages a 
[SSTable](https://cassandra.apache.org/doc/stable/cassandra/architecture/storage_engine.html#sstables)
 based file format for quick, indexed key lookups, 
+It leverages a 
[SSTable](https://cassandra.apache.org/doc/stable/cassandra/architecture/storage-engine.html#sstables)
 based file format for quick, indexed key lookups, 
 storing vital information like file paths, column statistics and schema. This 
approach streamlines operations by reducing the necessity for expensive cloud 
file listings. 
 
 Hudi’s approach of recording updates into Log Files is more efficient and 
involves low merge overhead than systems like Hive ACID, where merging all 
delta records against 
diff --git a/website/versioned_docs/version-1.0.1/metadata.md 
b/website/versioned_docs/version-1.0.1/metadata.md
index 8f3b403112ac..fe8827ebeec5 100644
--- a/website/versioned_docs/version-1.0.1/metadata.md
+++ b/website/versioned_docs/version-1.0.1/metadata.md
@@ -46,7 +46,7 @@ is tracked using internal tables. This approach provides the 
following advantage
 
 Following are the different types of metadata currently supported.
 
-- ***[files 
listings](https://cwiki.apache.org/confluence/display/HUDI/RFC+-+15%3A+HUDI+File+Listing+Improvements)***:
 
+- ***[files 
listings](https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=147427331)***:
 
   Stored as *files* partition in the metadata table. Contains file information 
such as file name, size, and active state
   for each partition in the data table, along with list of all partitions in 
the table. Improves the files listing performance 
   by avoiding direct storage calls such as *exists, listStatus* and 
*listFiles* on the data table.
diff --git a/website/versioned_docs/version-1.0.1/overview.mdx 
b/website/versioned_docs/version-1.0.1/overview.mdx
index bb8910f9c7ed..1e55d6916f3a 100644
--- a/website/versioned_docs/version-1.0.1/overview.mdx
+++ b/website/versioned_docs/version-1.0.1/overview.mdx
@@ -25,7 +25,7 @@ but it also allows you to create efficient incremental batch 
pipelines. Apache H
 Hudi’s advanced performance optimizations, make analytical queries/pipelines 
faster with any of the popular query engines including, Apache Spark, Flink, 
Presto, Trino, Hive, etc.
 
 Read the docs for more [use case descriptions](/docs/use_cases) and check out 
[who's using Hudi](/powered-by), to see how some of the
-largest data lakes in the world including 
[Uber](https://eng.uber.com/uber-big-data-platform/), 
[Amazon](https://aws.amazon.com/blogs/big-data/how-amazon-transportation-service-enabled-near-real-time-event-analytics-at-petabyte-scale-using-aws-glue-with-apache-hudi/),
+largest data lakes in the world including 
[Uber](https://www.uber.com/en-IN/blog/uber-big-data-platform/), 
[Amazon](https://aws.amazon.com/blogs/big-data/how-amazon-transportation-service-enabled-near-real-time-event-analytics-at-petabyte-scale-using-aws-glue-with-apache-hudi/),
 
[ByteDance](http://hudi.apache.org/blog/2021/09/01/building-eb-level-data-lake-using-hudi-at-bytedance),
 [Robinhood](https://s.apache.org/hudi-robinhood-talk) and more are 
transforming their production data lakes with Hudi.
 
diff --git a/website/versioned_docs/version-1.0.1/reading_tables_batch_reads.md 
b/website/versioned_docs/version-1.0.1/reading_tables_batch_reads.md
index d247fd4c3d08..f3ddcd236694 100644
--- a/website/versioned_docs/version-1.0.1/reading_tables_batch_reads.md
+++ b/website/versioned_docs/version-1.0.1/reading_tables_batch_reads.md
@@ -32,4 +32,4 @@ df = df.where(df["foo"] > 5)
 df.show()
 ```
 
-Check out the Daft docs for [Hudi 
integration](https://www.getdaft.io/projects/docs/en/latest/user_guide/integrations/hudi.html).
+Check out the Daft docs for [Hudi 
integration](https://docs.daft.ai/en/stable/connectors/hudi/).
diff --git a/website/versioned_docs/version-1.0.1/s3_hoodie.md 
b/website/versioned_docs/version-1.0.1/s3_hoodie.md
index 37f79ae75342..3161ea4bd284 100644
--- a/website/versioned_docs/version-1.0.1/s3_hoodie.md
+++ b/website/versioned_docs/version-1.0.1/s3_hoodie.md
@@ -88,7 +88,7 @@ AWS glue data libraries are needed if AWS glue data is used
 
 ## AWS S3 Versioned Bucket
 
-With versioned buckets any object deleted creates a [Delete 
Marker](https://docs.aws.amazon.com/AmazonS3/latest/userguide/DeleteMarker.html),
 as Hudi cleans up files using [Cleaner 
utility](https://hudi.apache.org/docs/hoodie_cleaner) the number of Delete 
Markers increases over time.
+With versioned buckets any object deleted creates a [Delete 
Marker](https://docs.aws.amazon.com/AmazonS3/latest/userguide/DeleteMarker.html),
 as Hudi cleans up files using [Cleaner utility](cleaning) the number of Delete 
Markers increases over time.
 It is important to configure the [Lifecycle 
Rule](https://docs.aws.amazon.com/AmazonS3/latest/userguide/object-lifecycle-mgmt.html)
 correctly
 to clean up these delete markers as the List operation can choke if the number 
of delete markers reaches 1000.
 We recommend cleaning up Delete Markers after 1 day in Lifecycle Rule.
\ No newline at end of file
diff --git a/website/versioned_docs/version-1.0.1/sql_queries.md 
b/website/versioned_docs/version-1.0.1/sql_queries.md
index 3042af0a0d05..b51b9155ddeb 100644
--- a/website/versioned_docs/version-1.0.1/sql_queries.md
+++ b/website/versioned_docs/version-1.0.1/sql_queries.md
@@ -647,7 +647,7 @@ for more details.
 ## Doris
 
 The Doris integration currently support Copy on Write and Merge On Read tables 
in Hudi since version 0.10.0. You can query Hudi tables via Doris from Doris 
version 2.0 Doris offers a multi-catalog, which is designed to make it easier 
to connect to external data catalogs to enhance Doris's data lake analysis and 
federated data query capabilities. Please refer
-to [Doris Hudi 
Catalog](https://doris.apache.org/docs/lakehouse/datalake-analytics/hudi/) for 
more details on the setup.
+to [Doris Hudi 
Catalog](https://doris.apache.org/docs/3.x/lakehouse/catalogs/hudi-catalog) for 
more details on the setup.
 
 :::note
 The current default supported version of Hudi is 0.10.0 ~ 0.13.1, and has not 
been tested in other versions. More versions will be supported in the future.
diff --git a/website/versioned_docs/version-1.0.1/structure.md 
b/website/versioned_docs/version-1.0.1/structure.md
index 137520dd2a54..0e15e353c30a 100644
--- a/website/versioned_docs/version-1.0.1/structure.md
+++ b/website/versioned_docs/version-1.0.1/structure.md
@@ -9,7 +9,7 @@ Hudi (pronounced “Hoodie”) ingests & manages storage of large 
analytical tab
 
  * **Read Optimized query** - Provides excellent query performance on pure 
columnar storage, much like plain [Parquet](https://parquet.apache.org/) tables.
  * **Incremental query** - Provides a change stream out of the dataset to feed 
downstream jobs/ETLs.
- * **Snapshot query** - Provides queries on real-time data, using a 
combination of columnar & row based storage (e.g Parquet + 
[Avro](http://avro.apache.org/docs/current/mr))
+ * **Snapshot query** - Provides queries on real-time data, using a 
combination of columnar & row based storage (e.g Parquet + 
[Avro](https://avro.apache.org/docs/++version++/mapreduce-guide/))
 
 <figure>
     <img className="docimage" 
src={require("/assets/images/hudi_intro_1.png").default} alt="hudi_intro_1.png" 
/>
diff --git a/website/versioned_docs/version-1.0.1/syncing_datahub.md 
b/website/versioned_docs/version-1.0.1/syncing_datahub.md
index 2a8003a2eec6..28803704c161 100644
--- a/website/versioned_docs/version-1.0.1/syncing_datahub.md
+++ b/website/versioned_docs/version-1.0.1/syncing_datahub.md
@@ -3,7 +3,7 @@ title: DataHub
 keywords: [hudi, datahub, sync]
 ---
 
-[DataHub](https://datahubproject.io/) is a rich metadata platform that 
supports features like data discovery, data
+[DataHub](https://datahub.com/) is a rich metadata platform that supports 
features like data discovery, data
 obeservability, federated governance, etc.
 
 Since Hudi 0.11.0, you can now sync to a DataHub instance by setting 
`DataHubSyncTool` as one of the sync tool classes
diff --git a/website/versioned_docs/version-1.0.1/troubleshooting.md 
b/website/versioned_docs/version-1.0.1/troubleshooting.md
index 4696694d41d8..47de1002beae 100644
--- a/website/versioned_docs/version-1.0.1/troubleshooting.md
+++ b/website/versioned_docs/version-1.0.1/troubleshooting.md
@@ -40,7 +40,7 @@ You can increase `hoodie.commits.archival.batch` moving 
forward to increase the
 In addition, you can increase the difference between the 2 watermark 
configurations : `hoodie.keep.max.commits` (default : 30) 
 and `hoodie.keep.min.commits` (default : 20). This way, you can reduce the 
number of archive files created and also 
 at the same time increase the number of metadata archived per archive file. 
Note that post 0.7.0 release where we are 
-adding consolidated Hudi metadata 
([RFC-15](https://cwiki.apache.org/confluence/display/HUDI/RFC+-+15%3A+HUDI+File+Listing+Improvements)),
 
+adding consolidated Hudi metadata 
([RFC-15](https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=147427331)),
 
 the follow up work would involve re-organizing archival metadata so that we 
can do periodic compactions to control 
 file-sizing of these archive files.
 
diff --git a/website/versioned_docs/version-1.0.1/tuning-guide.md 
b/website/versioned_docs/version-1.0.1/tuning-guide.md
index 4a1f72f1b05f..107fa6e67c70 100644
--- a/website/versioned_docs/version-1.0.1/tuning-guide.md
+++ b/website/versioned_docs/version-1.0.1/tuning-guide.md
@@ -57,7 +57,7 @@ When upsert large input data, hudi spills part of input data 
to disk when reach
 
 ### How to tune shuffle parallelism of Hudi jobs ?
 
-First, let's understand what the term parallelism means in the context of Hudi 
jobs. For any Hudi job using Spark, parallelism equals to the number of spark 
partitions that should be generated for a particular stage in the DAG. To 
understand more about spark partitions, read this 
[article](https://www.dezyre.com/article/how-data-partitioning-in-spark-helps-achieve-more-parallelism/297).
 In spark, each spark partition is mapped to a spark task that can be executed 
on an executor. Typicall [...]
+First, let's understand what the term parallelism means in the context of Hudi 
jobs. For any Hudi job using Spark, parallelism equals to the number of spark 
partitions that should be generated for a particular stage in the DAG. To 
understand more about spark partitions, read this 
[article](https://www.projectpro.io/article/how-data-partitioning-in-spark-helps-achieve-more-parallelism/297).
 In spark, each spark partition is mapped to a spark task that can be executed 
on an executor. Typic [...]
 
 (Spark Application → N Spark Jobs → M Spark Stages → T Spark Tasks) on (E 
executors with C cores)
 
diff --git a/website/versioned_docs/version-1.0.2/compaction.md 
b/website/versioned_docs/version-1.0.2/compaction.md
index 6af8e6361c19..35d781fa44e4 100644
--- a/website/versioned_docs/version-1.0.2/compaction.md
+++ b/website/versioned_docs/version-1.0.2/compaction.md
@@ -13,7 +13,7 @@ not applicable to Copy On Write(COW) tables and only applies 
to MOR tables.
 
 ### Why MOR tables need compaction?
 To understand the significance of compaction in MOR tables, it is helpful to 
understand the MOR table layout first. In Hudi, 
-data is organized in terms of [file 
groups](https://hudi.apache.org/docs/file_layouts/). Each file group in a MOR 
table 
+data is organized in terms of [file groups](/docs/storage_layouts/). Each file 
group in a MOR table 
 consists of a base file and one or more log files. Typically, during writes, 
inserts are stored in the base file, and updates 
 are appended to log files.
 
diff --git a/website/versioned_docs/version-1.0.2/comparison.md 
b/website/versioned_docs/version-1.0.2/comparison.md
index 681b359a4de8..0bcce2ace532 100644
--- a/website/versioned_docs/version-1.0.2/comparison.md
+++ b/website/versioned_docs/version-1.0.2/comparison.md
@@ -52,5 +52,5 @@ of PrestoDB/SparkSQL/Hive for your queries.
 
 More advanced use cases revolve around the concepts of [incremental 
processing](https://www.oreilly.com/ideas/ubers-case-for-incremental-processing-on-hadoop),
 which effectively
 uses Hudi even inside the `processing` engine to speed up typical batch 
pipelines. For e.g: Hudi can be used as a state store inside a processing DAG 
(similar
-to how 
[rocksDB](https://ci.apache.org/projects/flink/flink-docs-release-1.2/ops/state_backends#the-rocksdbstatebackend)
 is used by Flink). This is an item on the roadmap
+to how 
[rocksDB](https://ci.apache.org/projects/flink/flink-docs-release-1.2/ops/state_backends.html#the-rocksdbstatebackend)
 is used by Flink). This is an item on the roadmap
 and will eventually happen as a [Beam 
Runner](https://issues.apache.org/jira/browse/HUDI-60)
diff --git a/website/versioned_docs/version-1.0.2/configurations.md 
b/website/versioned_docs/version-1.0.2/configurations.md
index 2e05446ae5a7..022b2b172f23 100644
--- a/website/versioned_docs/version-1.0.2/configurations.md
+++ b/website/versioned_docs/version-1.0.2/configurations.md
@@ -1851,7 +1851,7 @@ These set of configs are used for Hudi Streamer utility 
which provides the way t
 | [hoodie.streamer.sample.writes.size](#hoodiestreamersamplewritessize)        
                                      | 5000    | Number of records to sample 
from the first write. To improve the estimation's accuracy, for smaller or more 
compressable record size, set the sample size bigger. For bigger or less 
compressable record size, set smaller.<br />`Config Param: 
SAMPLE_WRITES_SIZE`<br />`Since Version: 0.14.0`                                
                                            [...]
 | 
[hoodie.streamer.source.kafka.append.offsets](#hoodiestreamersourcekafkaappendoffsets)
                             | false   | When enabled, appends kafka offset 
info like source offset(_hoodie_kafka_source_offset), partition 
(_hoodie_kafka_source_partition) and timestamp (_hoodie_kafka_source_timestamp) 
to the records. By default its disabled and no kafka offsets are added<br 
/>`Config Param: KAFKA_APPEND_OFFSETS`                                          
                               [...]
 | 
[hoodie.streamer.source.sanitize.invalid.char.mask](#hoodiestreamersourcesanitizeinvalidcharmask)
                  | __      | Defines the character sequence that replaces 
invalid characters in schema field names if 
hoodie.streamer.source.sanitize.invalid.schema.field.names is enabled.<br 
/>`Config Param: SCHEMA_FIELD_NAME_INVALID_CHAR_MASK`                           
                                                                                
                                         [...]
-| 
[hoodie.streamer.source.sanitize.invalid.schema.field.names](#hoodiestreamersourcesanitizeinvalidschemafieldnames)
 | false   | Sanitizes names of invalid schema fields both in the data read 
from source and also in the schema Replaces invalid characters with 
hoodie.streamer.source.sanitize.invalid.char.mask. Invalid characters are by 
goes by avro naming convention 
(https://avro.apache.org/docs/current/spec.html#names).<br />`Config Param: 
SANITIZE_SCHEMA_FIELD_NAMES`                     [...]
+| 
[hoodie.streamer.source.sanitize.invalid.schema.field.names](#hoodiestreamersourcesanitizeinvalidschemafieldnames)
 | false   | Sanitizes names of invalid schema fields both in the data read 
from source and also in the schema Replaces invalid characters with 
hoodie.streamer.source.sanitize.invalid.char.mask. Invalid characters are by 
goes by avro naming convention 
(https://avro.apache.org/docs/++version++/specification/#names).<br />`Config 
Param: SANITIZE_SCHEMA_FIELD_NAMES`            [...]
 ---
 
 
diff --git a/website/versioned_docs/version-1.0.2/faq_storage.md 
b/website/versioned_docs/version-1.0.2/faq_storage.md
index fcce76aa46e1..8917fdcb9abb 100644
--- a/website/versioned_docs/version-1.0.2/faq_storage.md
+++ b/website/versioned_docs/version-1.0.2/faq_storage.md
@@ -47,7 +47,7 @@ The indexing component is a key part of the Hudi writing and 
it maps a given rec
 Hudi supports a few options for indexing as below
 
 *   _HoodieBloomIndex_ : Uses a bloom filter and ranges information placed in 
the footer of parquet/base files (and soon log files as well)
-*   _HoodieGlobalBloomIndex_ : The non global indexing only enforces 
uniqueness of a key inside a single partition i.e the user is expected to know 
the partition under which a given record key is stored. This helps the indexing 
scale very well for even [very large 
datasets](https://eng.uber.com/uber-big-data-platform/). However, in some 
cases, it might be necessary instead to do the de-duping/enforce uniqueness 
across all partitions and the global bloom index does exactly that. If this i 
[...]
+*   _HoodieGlobalBloomIndex_ : The non global indexing only enforces 
uniqueness of a key inside a single partition i.e the user is expected to know 
the partition under which a given record key is stored. This helps the indexing 
scale very well for even [very large 
datasets](https://www.uber.com/en-IN/blog/uber-big-data-platform/). However, in 
some cases, it might be necessary instead to do the de-duping/enforce 
uniqueness across all partitions and the global bloom index does exactly that 
[...]
 *   _HBaseIndex_ : Apache HBase is a key value store, typically found in close 
proximity to HDFS. You can also store the index inside HBase, which could be 
handy if you are already operating HBase.
 *   _HoodieSimpleIndex (default)_ : A simple index which reads interested 
fields (record key and partition path) from base files and joins with incoming 
records to find the tagged location.
 *   _HoodieGlobalSimpleIndex_ : Global version of Simple Index, where in 
uniqueness is on record key across entire table.
diff --git a/website/versioned_docs/version-1.0.2/hudi_stack.md 
b/website/versioned_docs/version-1.0.2/hudi_stack.md
index d28231244187..64d28643d39d 100644
--- a/website/versioned_docs/version-1.0.2/hudi_stack.md
+++ b/website/versioned_docs/version-1.0.2/hudi_stack.md
@@ -49,19 +49,19 @@ bring any compute engine for specific workloads.
 Drawing an analogy to file formats, a table format simply concerns with how 
files are distributed with the table, partitioning schemes, schema and metadata 
tracking changes. Hudi organizes files within a table or partition into 
 File Groups. Updates are captured in log files tied to these File Groups, 
ensuring efficient merges. There are three major components related to Hudi’s 
table format.
 
-- **Timeline** : Hudi's [timeline](./timeline), stored in the 
`/.hoodie/timeline` folder, is a crucial event log recording all table actions 
in an ordered manner, 
+- **Timeline** : Hudi's [timeline](/docs/timeline), stored in the 
`/.hoodie/timeline` folder, is a crucial event log recording all table actions 
in an ordered manner, 
   with events kept for a specified period. Hudi uniquely designs each File 
Group as a self-contained log, enabling record state reconstruction through 
delta logs, even after archival of historical actions. This approach 
effectively limits metadata size based on table activity frequency, essential 
for managing tables with frequent updates.
 
 - **File Group and File Slice** : Within each partition the data is physically 
stored as base and Log Files and organized into logical concepts as [File 
groups](https://hudi.apache.org/tech-specs-1point0/#storage-layout) and 
 File Slices. File groups contain multiple versions of File Slices and are 
split into multiple File Slices. A File Slice comprises the Base and Log File. 
Each File Slice within 
 the file-group is uniquely identified by the write that created its base file 
or the first log file, which helps order the File Slices.
 
-- **Metadata Table** : Implemented as another merge-on-read Hudi table, the 
[metadata table](./metadata) efficiently handles quick updates with low write 
amplification. 
-It leverages a 
[SSTable](https://cassandra.apache.org/doc/stable/cassandra/architecture/storage_engine.html#sstables)
 based file format for quick, indexed key lookups, 
+- **Metadata Table** : Implemented as another merge-on-read Hudi table, the 
[metadata table](/docs/metadata) efficiently handles quick updates with low 
write amplification. 
+It leverages a 
[SSTable](https://cassandra.apache.org/doc/stable/cassandra/architecture/storage-engine.html#sstables)
 based file format for quick, indexed key lookups, 
 storing vital information like file paths, column statistics and schema. This 
approach streamlines operations by reducing the necessity for expensive cloud 
file listings. 
 
 Hudi’s approach of recording updates into Log Files is more efficient and 
involves low merge overhead than systems like Hive ACID, where merging all 
delta records against 
-all Base Files is required. Read more about the various table types in Hudi 
[here](./table_types).
+all Base Files is required. Read more about the various table types in Hudi 
[here](/docs/table_types).
 
 
 ## Storage Engine
@@ -74,8 +74,8 @@ Cassandra and Clickhouse.
 ![Indexes](/assets/images/hudi-stack-indexes.png)
 <p align = "center">Figure: Indexes in Hudi</p>
 
-[Indexes](./indexes) in Hudi enhance query planning, minimizing I/O, speeding 
up response times and providing faster writes with low merge costs. The 
[metadata table](./metadata/#metadata-table-indices) acts 
-as an additional [indexing 
system](./metadata#supporting-multi-modal-index-in-hudi) and brings the 
benefits of indexes generally to both the readers and writers. Compute engines 
can leverage various indexes in the metadata
+[Indexes](/docs/indexes) in Hudi enhance query planning, minimizing I/O, 
speeding up response times and providing faster writes with low merge costs. 
The [metadata table](/docs/metadata/#metadata-table-indices) acts 
+as an additional [indexing 
system](/docs/metadata#supporting-multi-modal-index-in-hudi) and brings the 
benefits of indexes generally to both the readers and writers. Compute engines 
can leverage various indexes in the metadata
 table, like file listings, column statistics, bloom filters, record-level 
indexes, and [expression 
indexes](https://github.com/apache/hudi/blob/master/rfc/rfc-63/rfc-63.md) to 
quickly generate optimized query plans and improve read 
 performance. In addition to the metadata table indexes, Hudi supports simple 
join based indexing, bloom filters stored in base file footers, external 
key-value stores like HBase, 
 and optimized storage techniques like bucketing , to efficiently locate File 
Groups containing specific record keys. Hudi also provides reader indexes such 
as 
[expression](https://github.com/apache/hudi/blob/master/rfc/rfc-63/rfc-63.md) 
and 
@@ -91,12 +91,12 @@ running them in inline, semi-asynchronous or 
full-asynchronous modes. Furthermor
 asynchronously sharing the underlying executors intelligently with writers. 
Let’s take a look at these services.
 
 #### Clustering
-The [clustering](./clustering) service, akin to features in cloud data 
warehouses, allows users to group frequently queried records using sort keys or 
merge smaller Base Files into 
+The [clustering](/docs/clustering) service, akin to features in cloud data 
warehouses, allows users to group frequently queried records using sort keys or 
merge smaller Base Files into 
 larger ones for optimal file size management. It's fully integrated with other 
timeline actions like cleaning and compaction, enabling smart optimizations 
such as avoiding 
 compaction for File Groups undergoing clustering, thereby saving on I/O.
 
 #### Compaction
-Hudi's [compaction](./compaction) service, featuring strategies like date 
partitioning and I/O bounding, merges Base Files with delta logs to create 
updated Base Files. It allows 
+Hudi's [compaction](/docs/compaction) service, featuring strategies like date 
partitioning and I/O bounding, merges Base Files with delta logs to create 
updated Base Files. It allows 
 concurrent writes to the same File Froup, enabled by Hudi's file grouping and 
flexible log merging. This facilitates non-blocking execution of deletes even 
during concurrent 
 record updates.
 
@@ -107,11 +107,11 @@ while also allowing sufficient time for long running 
batch jobs (e.g Hive ETLs)
 #### Indexing
 Hudi's scalable metadata table contains auxiliary data about the table. This 
subsystem encompasses various indices, including files, column_stats, and 
bloom_filters, 
 facilitating efficient record location and data skipping. Balancing write 
throughput with index updates presents a fundamental challenge, as traditional 
indexing methods, 
-like locking out writes during indexing, are impractical for large tables due 
to lengthy processing times. Hudi addresses this with its innovative 
asynchronous [metadata indexing](./metadata_indexing), 
+like locking out writes during indexing, are impractical for large tables due 
to lengthy processing times. Hudi addresses this with its innovative 
asynchronous [metadata indexing](/docs/metadata_indexing), 
 enabling the creation of various indices without impeding writes. This 
approach not only improves write latency but also minimizes resource waste by 
reducing contention between writing and indexing activities.
 
 ### Concurrency Control
-[Concurrency control](./concurrency_control) defines how different 
writers/readers/table services coordinate access to the table. Hudi uses 
monotonically increasing time to sequence and order various 
+[Concurrency control](/docs/concurrency_control) defines how different 
writers/readers/table services coordinate access to the table. Hudi uses 
monotonically increasing time to sequence and order various 
 changes to table state. Much like databases, Hudi take an approach of clearly 
differentiating between writers (responsible for upserts/deletes), table 
services 
 (focusing on storage optimization and bookkeeping), and readers (for query 
execution). Hudi provides snapshot isolation, offering a consistent view of the 
table across 
 these different operations. It employs lock-free, non-blocking MVCC for 
concurrency between writers and table-services, as well as between different 
table services, and 
@@ -154,12 +154,12 @@ integration with engines written in C/C++.
 <p align = "center">Figure: Various platform services in Hudi</p>
 
 Platform services offer functionality that is specific to data and workloads, 
and they sit directly on top of the table services, interfacing with writers 
and readers. 
-Services, like [Hudi Streamer](./hoodie_streaming_ingestion#hudi-streamer) (or 
its Flink counterpart), are specialized in handling data and workloads, 
seamlessly integrating with Kafka streams and various 
+Services, like [Hudi Streamer](/docs/hoodie_streaming_ingestion#hudi-streamer) 
(or its Flink counterpart), are specialized in handling data and workloads, 
seamlessly integrating with Kafka streams and various 
 formats to build data lakes. They support functionalities like automatic 
checkpoint management, integration with major schema registries (including 
Confluent), and 
 deduplication of data. Hudi Streamer also offers features for backfills, 
one-off runs, and continuous mode operation with Spark/Flink streaming writers. 
Additionally, 
-Hudi provides tools for [snapshotting](./snapshot_exporter) and incrementally 
[exporting](./snapshot_exporter#examples) Hudi tables, importing new tables, 
and [post-commit callback](platform_services_post_commit_callback) for 
analytics or 
+Hudi provides tools for [snapshotting](/docs/snapshot_exporter) and 
incrementally [exporting](/docs/snapshot_exporter#examples) Hudi tables, 
importing new tables, and [post-commit 
callback](/docs/platform_services_post_commit_callback) for analytics or 
 workflow management, enhancing the deployment of production-grade incremental 
pipelines. Apart from these services, Hudi also provides broad support for 
different 
-catalogs such as [Hive Metastore](./syncing_metastore), [AWS 
Glue](./syncing_aws_glue_data_catalog/), [Google BigQuery](./gcp_bigquery), 
[DataHub](./syncing_datahub), etc. that allows syncing of Hudi tables to be 
queried by 
+catalogs such as [Hive Metastore](/docs/syncing_metastore), [AWS 
Glue](/docs/syncing_aws_glue_data_catalog/), [Google 
BigQuery](/docs/gcp_bigquery), [DataHub](/docs/syncing_datahub), etc. that 
allows syncing of Hudi tables to be queried by 
 interactive engines such as Trino and Presto.
 
 ### Metaserver*
diff --git a/website/versioned_docs/version-1.0.2/metadata.md 
b/website/versioned_docs/version-1.0.2/metadata.md
index 8f3b403112ac..fe8827ebeec5 100644
--- a/website/versioned_docs/version-1.0.2/metadata.md
+++ b/website/versioned_docs/version-1.0.2/metadata.md
@@ -46,7 +46,7 @@ is tracked using internal tables. This approach provides the 
following advantage
 
 Following are the different types of metadata currently supported.
 
-- ***[files 
listings](https://cwiki.apache.org/confluence/display/HUDI/RFC+-+15%3A+HUDI+File+Listing+Improvements)***:
 
+- ***[files 
listings](https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=147427331)***:
 
   Stored as *files* partition in the metadata table. Contains file information 
such as file name, size, and active state
   for each partition in the data table, along with list of all partitions in 
the table. Improves the files listing performance 
   by avoiding direct storage calls such as *exists, listStatus* and 
*listFiles* on the data table.
diff --git a/website/versioned_docs/version-1.0.2/overview.mdx 
b/website/versioned_docs/version-1.0.2/overview.mdx
index bb8910f9c7ed..1e55d6916f3a 100644
--- a/website/versioned_docs/version-1.0.2/overview.mdx
+++ b/website/versioned_docs/version-1.0.2/overview.mdx
@@ -25,7 +25,7 @@ but it also allows you to create efficient incremental batch 
pipelines. Apache H
 Hudi’s advanced performance optimizations, make analytical queries/pipelines 
faster with any of the popular query engines including, Apache Spark, Flink, 
Presto, Trino, Hive, etc.
 
 Read the docs for more [use case descriptions](/docs/use_cases) and check out 
[who's using Hudi](/powered-by), to see how some of the
-largest data lakes in the world including 
[Uber](https://eng.uber.com/uber-big-data-platform/), 
[Amazon](https://aws.amazon.com/blogs/big-data/how-amazon-transportation-service-enabled-near-real-time-event-analytics-at-petabyte-scale-using-aws-glue-with-apache-hudi/),
+largest data lakes in the world including 
[Uber](https://www.uber.com/en-IN/blog/uber-big-data-platform/), 
[Amazon](https://aws.amazon.com/blogs/big-data/how-amazon-transportation-service-enabled-near-real-time-event-analytics-at-petabyte-scale-using-aws-glue-with-apache-hudi/),
 
[ByteDance](http://hudi.apache.org/blog/2021/09/01/building-eb-level-data-lake-using-hudi-at-bytedance),
 [Robinhood](https://s.apache.org/hudi-robinhood-talk) and more are 
transforming their production data lakes with Hudi.
 
diff --git a/website/versioned_docs/version-1.0.2/s3_hoodie.md 
b/website/versioned_docs/version-1.0.2/s3_hoodie.md
index 37f79ae75342..fac2f76d61d2 100644
--- a/website/versioned_docs/version-1.0.2/s3_hoodie.md
+++ b/website/versioned_docs/version-1.0.2/s3_hoodie.md
@@ -88,7 +88,7 @@ AWS glue data libraries are needed if AWS glue data is used
 
 ## AWS S3 Versioned Bucket
 
-With versioned buckets any object deleted creates a [Delete 
Marker](https://docs.aws.amazon.com/AmazonS3/latest/userguide/DeleteMarker.html),
 as Hudi cleans up files using [Cleaner 
utility](https://hudi.apache.org/docs/hoodie_cleaner) the number of Delete 
Markers increases over time.
+With versioned buckets any object deleted creates a [Delete 
Marker](https://docs.aws.amazon.com/AmazonS3/latest/userguide/DeleteMarker.html),
 as Hudi cleans up files using [Cleaner utility](/docs/cleaning) the number of 
Delete Markers increases over time.
 It is important to configure the [Lifecycle 
Rule](https://docs.aws.amazon.com/AmazonS3/latest/userguide/object-lifecycle-mgmt.html)
 correctly
 to clean up these delete markers as the List operation can choke if the number 
of delete markers reaches 1000.
 We recommend cleaning up Delete Markers after 1 day in Lifecycle Rule.
\ No newline at end of file
diff --git a/website/versioned_docs/version-1.0.2/sql_queries.md 
b/website/versioned_docs/version-1.0.2/sql_queries.md
index 6f8c33026e05..d1fa84b9578b 100644
--- a/website/versioned_docs/version-1.0.2/sql_queries.md
+++ b/website/versioned_docs/version-1.0.2/sql_queries.md
@@ -647,7 +647,7 @@ for more details.
 ## Doris
 
 The Doris integration currently support Copy on Write and Merge On Read tables 
in Hudi since version 0.10.0. You can query Hudi tables via Doris from Doris 
version 2.0. Doris offers a multi-catalog, which is designed to make it easier 
to connect to external data catalogs to enhance Doris's data lake analysis and 
federated data query capabilities. Please refer
-to [Doris Hudi 
Catalog](https://doris.apache.org/docs/lakehouse/datalake-analytics/hudi/) for 
more details on the setup.
+to [Doris Hudi 
Catalog](https://doris.apache.org/docs/3.x/lakehouse/catalogs/hudi-catalog) for 
more details on the setup.
 
 :::note
 The current default supported version of Hudi is 0.10.0 ~ 0.13.1, and has not 
been tested in other versions. More versions will be supported in the future.
diff --git a/website/versioned_docs/version-1.0.2/structure.md 
b/website/versioned_docs/version-1.0.2/structure.md
index 137520dd2a54..0e15e353c30a 100644
--- a/website/versioned_docs/version-1.0.2/structure.md
+++ b/website/versioned_docs/version-1.0.2/structure.md
@@ -9,7 +9,7 @@ Hudi (pronounced “Hoodie”) ingests & manages storage of large 
analytical tab
 
  * **Read Optimized query** - Provides excellent query performance on pure 
columnar storage, much like plain [Parquet](https://parquet.apache.org/) tables.
  * **Incremental query** - Provides a change stream out of the dataset to feed 
downstream jobs/ETLs.
- * **Snapshot query** - Provides queries on real-time data, using a 
combination of columnar & row based storage (e.g Parquet + 
[Avro](http://avro.apache.org/docs/current/mr))
+ * **Snapshot query** - Provides queries on real-time data, using a 
combination of columnar & row based storage (e.g Parquet + 
[Avro](https://avro.apache.org/docs/++version++/mapreduce-guide/))
 
 <figure>
     <img className="docimage" 
src={require("/assets/images/hudi_intro_1.png").default} alt="hudi_intro_1.png" 
/>
diff --git a/website/versioned_docs/version-1.0.2/syncing_datahub.md 
b/website/versioned_docs/version-1.0.2/syncing_datahub.md
index 2a8003a2eec6..28803704c161 100644
--- a/website/versioned_docs/version-1.0.2/syncing_datahub.md
+++ b/website/versioned_docs/version-1.0.2/syncing_datahub.md
@@ -3,7 +3,7 @@ title: DataHub
 keywords: [hudi, datahub, sync]
 ---
 
-[DataHub](https://datahubproject.io/) is a rich metadata platform that 
supports features like data discovery, data
+[DataHub](https://datahub.com/) is a rich metadata platform that supports 
features like data discovery, data
 obeservability, federated governance, etc.
 
 Since Hudi 0.11.0, you can now sync to a DataHub instance by setting 
`DataHubSyncTool` as one of the sync tool classes
diff --git a/website/versioned_docs/version-1.0.2/table_types.md 
b/website/versioned_docs/version-1.0.2/table_types.md
index 3b7ec911bfc0..c2ae8baab9eb 100644
--- a/website/versioned_docs/version-1.0.2/table_types.md
+++ b/website/versioned_docs/version-1.0.2/table_types.md
@@ -204,4 +204,4 @@ Refer 
[here](https://hudi.apache.org/docs/next/configurations#Flink-Options) for
 
 * [Comparing Apache Hudi's MOR and COW Tables, Use Cases from 
Uber](https://youtu.be/BiTXyzFNHlA)
 * [Different table types in Apache Hudi, MOR and COW, Deep 
Dive](https://youtu.be/vyEvlt57L-s)
-* [How to Query Hudi Tables in Incremental Fashion and Get only New data on 
AWS Glue | Hands on Lab](https://www.youtube.com/watch?v=c6DCJR91rBQx)
\ No newline at end of file
+* [How to Query Hudi Tables in Incremental Fashion and Get only New data on 
AWS Glue | Hands on Lab](https://www.youtube.com/watch?v=c6DCJR91rBQ)
\ No newline at end of file
diff --git a/website/versioned_docs/version-1.0.2/troubleshooting.md 
b/website/versioned_docs/version-1.0.2/troubleshooting.md
index 4696694d41d8..47de1002beae 100644
--- a/website/versioned_docs/version-1.0.2/troubleshooting.md
+++ b/website/versioned_docs/version-1.0.2/troubleshooting.md
@@ -40,7 +40,7 @@ You can increase `hoodie.commits.archival.batch` moving 
forward to increase the
 In addition, you can increase the difference between the 2 watermark 
configurations : `hoodie.keep.max.commits` (default : 30) 
 and `hoodie.keep.min.commits` (default : 20). This way, you can reduce the 
number of archive files created and also 
 at the same time increase the number of metadata archived per archive file. 
Note that post 0.7.0 release where we are 
-adding consolidated Hudi metadata 
([RFC-15](https://cwiki.apache.org/confluence/display/HUDI/RFC+-+15%3A+HUDI+File+Listing+Improvements)),
 
+adding consolidated Hudi metadata 
([RFC-15](https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=147427331)),
 
 the follow up work would involve re-organizing archival metadata so that we 
can do periodic compactions to control 
 file-sizing of these archive files.
 
diff --git a/website/versioned_docs/version-1.0.2/tuning-guide.md 
b/website/versioned_docs/version-1.0.2/tuning-guide.md
index 4a1f72f1b05f..107fa6e67c70 100644
--- a/website/versioned_docs/version-1.0.2/tuning-guide.md
+++ b/website/versioned_docs/version-1.0.2/tuning-guide.md
@@ -57,7 +57,7 @@ When upsert large input data, hudi spills part of input data 
to disk when reach
 
 ### How to tune shuffle parallelism of Hudi jobs ?
 
-First, let's understand what the term parallelism means in the context of Hudi 
jobs. For any Hudi job using Spark, parallelism equals to the number of spark 
partitions that should be generated for a particular stage in the DAG. To 
understand more about spark partitions, read this 
[article](https://www.dezyre.com/article/how-data-partitioning-in-spark-helps-achieve-more-parallelism/297).
 In spark, each spark partition is mapped to a spark task that can be executed 
on an executor. Typicall [...]
+First, let's understand what the term parallelism means in the context of Hudi 
jobs. For any Hudi job using Spark, parallelism equals to the number of spark 
partitions that should be generated for a particular stage in the DAG. To 
understand more about spark partitions, read this 
[article](https://www.projectpro.io/article/how-data-partitioning-in-spark-helps-achieve-more-parallelism/297).
 In spark, each spark partition is mapped to a spark task that can be executed 
on an executor. Typic [...]
 
 (Spark Application → N Spark Jobs → M Spark Stages → T Spark Tasks) on (E 
executors with C cores)

(hudi) branch asf-site updated: docs: fix broken links in Hudi website since 0.14.0 (#14192)

Reply via email to