[hudi] branch asf-site updated: [DOCS] update broken links (#5333)

yihua Mon, 18 Apr 2022 16:23:05 -0700

This is an automated email from the ASF dual-hosted git repository.

yihua pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/hudi.git



The following commit(s) were added to refs/heads/asf-site by this push:
     new 4ce0db3b93 [DOCS] update broken links (#5333)
4ce0db3b93 is described below

commit 4ce0db3b93967158b5e854d8230d71a38e221c77
Author: Bhavani Sudha Saktheeswaran <[email protected]>
AuthorDate: Mon Apr 18 16:22:51 2022 -0700

    [DOCS] update broken links (#5333)
    
    Co-authored-by: Bhavani Sudha Saktheeswaran <[email protected]>
---
 website/docs/clustering.md              | 20 ++++++++++----------
 website/docs/concurrency_control.md     | 10 +++++-----
 website/docs/deployment.md              |  8 ++++----
 website/docs/faq.md                     |  8 ++++----
 website/docs/flink-quick-start-guide.md |  2 +-
 website/docs/flink_configuration.md     |  2 +-
 website/docs/hoodie_cleaner.md          |  2 +-
 website/docs/hoodie_deltastreamer.md    |  8 ++++----
 website/docs/key_generation.md          |  2 +-
 website/docs/metrics.md                 | 10 +++++-----
 website/docs/performance.md             |  8 ++++----
 website/docs/query_engine_setup.md      |  2 +-
 website/docs/querying_data.md           |  8 ++++----
 website/docs/quick-start-guide.md       | 12 ++++++------
 website/docs/use_cases.md               |  4 ++--
 website/docs/write_operations.md        |  2 +-
 website/docs/writing_data.md            | 22 +++++++++++-----------
 17 files changed, 65 insertions(+), 65 deletions(-)

diff --git a/website/docs/clustering.md b/website/docs/clustering.md
index f210a15b1b..9e157de785 100644
--- a/website/docs/clustering.md
+++ b/website/docs/clustering.md
@@ -12,7 +12,7 @@ Apache Hudi brings stream processing to big data, providing 
fresh data while bei
 
 ## Clustering Architecture
 
-At a high level, Hudi provides different operations such as 
insert/upsert/bulk_insert through it’s write client API to be able to write 
data to a Hudi table. To be able to choose a trade-off between file size and 
ingestion speed, Hudi provides a knob `hoodie.parquet.small.file.limit` to be 
able to configure the smallest allowable file size. Users are able to configure 
the small file [soft 
limit](https://hudi.apache.org/docs/configurations#compactionSmallFileSize) to 
`0` to force new data [...]
+At a high level, Hudi provides different operations such as 
insert/upsert/bulk_insert through it’s write client API to be able to write 
data to a Hudi table. To be able to choose a trade-off between file size and 
ingestion speed, Hudi provides a knob `hoodie.parquet.small.file.limit` to be 
able to configure the smallest allowable file size. Users are able to configure 
the small file [soft 
limit](https://hudi.apache.org/docs/configurations/#hoodieparquetsmallfilelimit)
 to `0` to force new [...]
 
 
 
@@ -95,12 +95,12 @@ broadly classified into three types: clustering plan 
strategy, execution strateg
 
 This strategy comes into play while creating clustering plan. It helps to 
decide what file groups should be clustered.
 Let's look at different plan strategies that are available with Hudi. Note 
that these strategies are easily pluggable
-using this 
[config](/docs/next/configurations#hoodieclusteringplanstrategyclass).
+using this [config](/docs/configurations#hoodieclusteringplanstrategyclass).
 
 1. `SparkSizeBasedClusteringPlanStrategy`: It selects file slices based on
-   the [small file 
limit](/docs/next/configurations/#hoodieclusteringplanstrategysmallfilelimit)
+   the [small file 
limit](/docs/configurations/#hoodieclusteringplanstrategysmallfilelimit)
    of base files and creates clustering groups upto max file size allowed per 
group. The max size can be specified using
-   this 
[config](/docs/next/configurations/#hoodieclusteringplanstrategymaxbytespergroup).
 This
+   this 
[config](/docs/configurations/#hoodieclusteringplanstrategymaxbytespergroup). 
This
    strategy is useful for stitching together medium-sized files into larger 
ones to reduce lot of files spread across
    cold partitions.
 2. `SparkRecentDaysClusteringPlanStrategy`: It looks back previous 'N' days 
partitions and creates a plan that will
@@ -122,12 +122,12 @@ All the strategies are partition-aware and the latter two 
are still bound by the
 ### Execution Strategy
 
 After building the clustering groups in the planning phase, Hudi applies 
execution strategy, for each group, primarily
-based on sort columns and size. The strategy can be specified using this 
[config](/docs/next/configurations/#hoodieclusteringexecutionstrategyclass).
+based on sort columns and size. The strategy can be specified using this 
[config](/docs/configurations/#hoodieclusteringexecutionstrategyclass).
 
 `SparkSortAndSizeExecutionStrategy` is the default strategy. Users can specify 
the columns to sort the data by, when
 clustering using
-this 
[config](/docs/next/configurations/#hoodieclusteringplanstrategysortcolumns). 
Apart from
-that, we can also set [max file 
size](/docs/next/configurations/#hoodieparquetmaxfilesize)
+this [config](/docs/configurations/#hoodieclusteringplanstrategysortcolumns). 
Apart from
+that, we can also set [max file 
size](/docs/configurations/#hoodieparquetmaxfilesize)
 for the parquet files produced due to clustering. The strategy uses bulk 
insert to write data into new files, in which
 case, Hudi implicitly uses a partitioner that does sorting based on specified 
columns. In this way, the strategy changes
 the data layout in a way that not only improves query performance but also 
balance rewrite overhead automatically.
@@ -135,19 +135,19 @@ the data layout in a way that not only improves query 
performance but also balan
 Now this strategy can be executed either as a single spark job or multiple 
jobs depending on number of clustering groups
 created in the planning phase. By default, Hudi will submit multiple spark 
jobs and union the results. In case you want
 to force Hudi to use single spark job, set the execution strategy
-class 
[config](/docs/next/configurations/#hoodieclusteringexecutionstrategyclass)
+class [config](/docs/configurations/#hoodieclusteringexecutionstrategyclass)
 to `SingleSparkJobExecutionStrategy`.
 
 ### Update Strategy
 
 Currently, clustering can only be scheduled for tables/partitions not 
receiving any concurrent updates. By default,
-the [config for update 
strategy](/docs/next/configurations/#hoodieclusteringupdatesstrategy) is
+the [config for update 
strategy](/docs/configurations/#hoodieclusteringupdatesstrategy) is
 set to ***SparkRejectUpdateStrategy***. If some file group has updates during 
clustering then it will reject updates and
 throw an exception. However, in some use-cases updates are very sparse and do 
not touch most file groups. The default
 strategy to simply reject updates does not seem fair. In such use-cases, users 
can set the config to ***SparkAllowUpdateStrategy***.
 
 We discussed the critical strategy configurations. All other configurations 
related to clustering are
-listed [here](/docs/next/configurations/#Clustering-Configs). Out of this 
list, a few
+listed [here](/docs/configurations/#Clustering-Configs). Out of this list, a 
few
 configurations that will be very useful are:
 
 |  Config key  | Remarks | Default |
diff --git a/website/docs/concurrency_control.md 
b/website/docs/concurrency_control.md
index a9a0d5860c..e71cb4a8f2 100644
--- a/website/docs/concurrency_control.md
+++ b/website/docs/concurrency_control.md
@@ -19,13 +19,13 @@ between multiple table service writers and readers. 
Additionally, using MVCC, Hu
 the same Hudi Table. Hudi supports `file level OCC`, i.e., for any 2 commits 
(or writers) happening to the same table, if they do not have writes to 
overlapping files being changed, both writers are allowed to succeed. 
   This feature is currently *experimental* and requires either Zookeeper or 
HiveMetastore to acquire locks.
 
-It may be helpful to understand the different guarantees provided by [write 
operations](/docs/writing_data#write-operations) via Hudi datasource or the 
delta streamer.
+It may be helpful to understand the different guarantees provided by [write 
operations](/docs/write_operations/) via Hudi datasource or the delta streamer.
 
 ## Single Writer Guarantees
 
  - *UPSERT Guarantee*: The target table will NEVER show duplicates.
- - *INSERT Guarantee*: The target table wilL NEVER have duplicates if 
[dedup](/docs/configurations#INSERT_DROP_DUPS_OPT_KEY) is enabled.
- - *BULK_INSERT Guarantee*: The target table will NEVER have duplicates if 
[dedup](/docs/configurations#INSERT_DROP_DUPS_OPT_KEY) is enabled.
+ - *INSERT Guarantee*: The target table wilL NEVER have duplicates if 
[dedup](/docs/configurations#hoodiedatasourcewriteinsertdropduplicates) is 
enabled.
+ - *BULK_INSERT Guarantee*: The target table will NEVER have duplicates if 
[dedup](/docs/configurations#hoodiedatasourcewriteinsertdropduplicates) is 
enabled.
  - *INCREMENTAL PULL Guarantee*: Data consumption and checkpoints are NEVER 
out of order.
 
 ## Multi Writer Guarantees
@@ -33,8 +33,8 @@ It may be helpful to understand the different guarantees 
provided by [write oper
 With multiple writers using OCC, some of the above guarantees change as follows
 
 - *UPSERT Guarantee*: The target table will NEVER show duplicates.
-- *INSERT Guarantee*: The target table MIGHT have duplicates even if 
[dedup](/docs/configurations#INSERT_DROP_DUPS_OPT_KEY) is enabled.
-- *BULK_INSERT Guarantee*: The target table MIGHT have duplicates even if 
[dedup](/docs/configurations#INSERT_DROP_DUPS_OPT_KEY) is enabled.
+- *INSERT Guarantee*: The target table MIGHT have duplicates even if 
[dedup](/docs/configurations#hoodiedatasourcewriteinsertdropduplicates) is 
enabled.
+- *BULK_INSERT Guarantee*: The target table MIGHT have duplicates even if 
[dedup](/docs/configurations#hoodiedatasourcewriteinsertdropduplicates) is 
enabled.
 - *INCREMENTAL PULL Guarantee*: Data consumption and checkpoints MIGHT be out 
of order due to multiple writer jobs finishing at different times.
 
 ## Enabling Multi Writing
diff --git a/website/docs/deployment.md b/website/docs/deployment.md
index a33c30a951..739480205d 100644
--- a/website/docs/deployment.md
+++ b/website/docs/deployment.md
@@ -25,9 +25,9 @@ With Merge_On_Read Table, Hudi ingestion needs to also take 
care of compacting d
 
 ### DeltaStreamer
 
-[DeltaStreamer](/docs/writing_data#deltastreamer) is the standalone utility to 
incrementally pull upstream changes from varied sources such as DFS, Kafka and 
DB Changelogs and ingest them to hudi tables. It runs as a spark application in 
2 modes.
+[DeltaStreamer](/docs/hoodie_deltastreamer#deltastreamer) is the standalone 
utility to incrementally pull upstream changes from varied sources such as DFS, 
Kafka and DB Changelogs and ingest them to hudi tables. It runs as a spark 
application in 2 modes.
 
- - **Run Once Mode** : In this mode, Deltastreamer performs one ingestion 
round which includes incrementally pulling events from upstream sources and 
ingesting them to hudi table. Background operations like cleaning old file 
versions and archiving hoodie timeline are automatically executed as part of 
the run. For Merge-On-Read tables, Compaction is also run inline as part of 
ingestion unless disabled by passing the flag "--disable-compaction". By 
default, Compaction is run inline for eve [...]
+ - **Run Once Mode** : In this mode, Deltastreamer performs one ingestion 
round which includes incrementally pulling events from upstream sources and 
ingesting them to hudi table. Background operations like cleaning old file 
versions and archiving hoodie timeline are automatically executed as part of 
the run. For Merge-On-Read tables, Compaction is also run inline as part of 
ingestion unless disabled by passing the flag "--disable-compaction". By 
default, Compaction is run inline for eve [...]
 
 Here is an example invocation for reading from kafka topic in a single-run 
mode and writing to Merge On Read table type in a yarn cluster.
 
@@ -126,7 +126,7 @@ Here is an example invocation for reading from kafka topic 
in a continuous mode
 
 ### Spark Datasource Writer Jobs
 
-As described in [Writing Data](/docs/writing_data#datasource-writer), you can 
use spark datasource to ingest to hudi table. This mechanism allows you to 
ingest any spark dataframe in Hudi format. Hudi Spark DataSource also supports 
spark streaming to ingest a streaming source to Hudi table. For Merge On Read 
table types, inline compaction is turned on by default which runs after every 
ingestion run. The compaction frequency can be changed by setting the property 
"hoodie.compact.inline.ma [...]
+As described in [Writing Data](/docs/writing_data#spark-datasource-writer), 
you can use spark datasource to ingest to hudi table. This mechanism allows you 
to ingest any spark dataframe in Hudi format. Hudi Spark DataSource also 
supports spark streaming to ingest a streaming source to Hudi table. For Merge 
On Read table types, inline compaction is turned on by default which runs after 
every ingestion run. The compaction frequency can be changed by setting the 
property "hoodie.compact.inl [...]
 
 Here is an example invocation using spark datasource
 
@@ -144,7 +144,7 @@ inputDF.write()
  
 ## Upgrading 
 
-New Hudi releases are listed on the [releases page](/releases), with detailed 
notes which list all the changes, with highlights in each release. 
+New Hudi releases are listed on the [releases page](/releases/download), with 
detailed notes which list all the changes, with highlights in each release. 
 At the end of the day, Hudi is a storage system and with that comes a lot of 
responsibilities, which we take seriously. 
 
 As general guidelines, 
diff --git a/website/docs/faq.md b/website/docs/faq.md
index c675788561..cee9e583e5 100644
--- a/website/docs/faq.md
+++ b/website/docs/faq.md
@@ -83,7 +83,7 @@ At a high level, Hudi is based on MVCC design that writes 
data to versioned parq
 
 ### What are some ways to write a Hudi dataset?
 
-Typically, you obtain a set of partial updates/inserts from your source and 
issue [write operations](https://hudi.apache.org/docs/writing_data/) against a 
Hudi dataset.  If you ingesting data from any of the standard sources like 
Kafka, or tailing DFS, the [delta 
streamer](https://hudi.apache.org/docs/writing_data/#deltastreamer) tool is 
invaluable and provides an easy, self-managed solution to getting data written 
into Hudi. You can also write your own code to capture data from a custom [...]
+Typically, you obtain a set of partial updates/inserts from your source and 
issue [write operations](https://hudi.apache.org/docs/write_operations/) 
against a Hudi dataset.  If you ingesting data from any of the standard sources 
like Kafka, or tailing DFS, the [delta 
streamer](https://hudi.apache.org/docs/hoodie_deltastreamer#deltastreamer) tool 
is invaluable and provides an easy, self-managed solution to getting data 
written into Hudi. You can also write your own code to capture data fr [...]
 
 ### How is a Hudi job deployed?
 
@@ -225,7 +225,7 @@ set 
hive.input.format=org.apache.hudi.hadoop.hive.HoodieCombineHiveInputFormat
 
 ### Can I register my Hudi dataset with Apache Hive metastore?
 
-Yes. This can be performed either via the standalone [Hive Sync 
tool](https://hudi.apache.org/docs/writing_data/#syncing-to-hive) or using 
options in 
[deltastreamer](https://github.com/apache/hudi/blob/d3edac4612bde2fa9deca9536801dbc48961fb95/docker/demo/sparksql-incremental.commands#L50)
 tool or 
[datasource](https://hudi.apache.org/docs/configurations#hoodiedatasourcehive_syncenable).
+Yes. This can be performed either via the standalone [Hive Sync 
tool](https://hudi.apache.org/docs/syncing_metastore#hive-sync-tool) or using 
options in 
[deltastreamer](https://github.com/apache/hudi/blob/d3edac4612bde2fa9deca9536801dbc48961fb95/docker/demo/sparksql-incremental.commands#L50)
 tool or 
[datasource](https://hudi.apache.org/docs/configurations#hoodiedatasourcehive_syncenable).
 
 ### How does the Hudi indexing work & what are its benefits? 
 
@@ -255,7 +255,7 @@ That said, for obvious reasons of not blocking ingesting 
for compaction, you may
 
 ### What performance/ingest latency can I expect for Hudi writing?
 
-The speed at which you can write into Hudi depends on the [write 
operation](https://hudi.apache.org/docs/writing_data/) and some trade-offs you 
make along the way like file sizing. Just like how databases incur overhead 
over direct/raw file I/O on disks,  Hudi operations may have overhead from 
supporting  database like features compared to reading/writing raw DFS files. 
That said, Hudi implements advanced techniques from database literature to keep 
these minimal. User is encouraged to ha [...]
+The speed at which you can write into Hudi depends on the [write 
operation](https://hudi.apache.org/docs/write_operations) and some trade-offs 
you make along the way like file sizing. Just like how databases incur overhead 
over direct/raw file I/O on disks,  Hudi operations may have overhead from 
supporting  database like features compared to reading/writing raw DFS files. 
That said, Hudi implements advanced techniques from database literature to keep 
these minimal. User is encouraged to [...]
 
 | Storage Type | Type of workload | Performance | Tips |
 |-------|--------|--------|--------|
@@ -364,7 +364,7 @@ spark.read.parquet("your_data_set/path/to/month").limit(n) 
// Limit n records
      .save(basePath);
 ```
 
-For merge on read table, you may want to also try scheduling and running 
compaction jobs. You can run compaction directly using spark submit on 
org.apache.hudi.utilities.HoodieCompactor or by using [HUDI 
CLI](https://hudi.apache.org/docs/deployment/#cli).
+For merge on read table, you may want to also try scheduling and running 
compaction jobs. You can run compaction directly using spark submit on 
org.apache.hudi.utilities.HoodieCompactor or by using [HUDI 
CLI](https://hudi.apache.org/docs/cli).
 
 ### If I keep my file versions at 1, with this configuration will i be able to 
do a roll back (to the last commit) when write fail?
 
diff --git a/website/docs/flink-quick-start-guide.md 
b/website/docs/flink-quick-start-guide.md
index a723b8ed7b..daec4ba0b5 100644
--- a/website/docs/flink-quick-start-guide.md
+++ b/website/docs/flink-quick-start-guide.md
@@ -31,7 +31,7 @@ Start a standalone Flink cluster within hadoop environment.
 Before you start up the cluster, we suggest to config the cluster as follows:
 
 - in `$FLINK_HOME/conf/flink-conf.yaml`, add config option 
`taskmanager.numberOfTaskSlots: 4`
-- in `$FLINK_HOME/conf/flink-conf.yaml`, [add other global configurations 
according to the characteristics of your task](#flink-configuration)
+- in `$FLINK_HOME/conf/flink-conf.yaml`, [add other global configurations 
according to the characteristics of your 
task](flink_configuration#global-configurations)
 - in `$FLINK_HOME/conf/workers`, add item `localhost` as 4 lines so that there 
are 4 workers on the local cluster
 
 Now starts the cluster:
diff --git a/website/docs/flink_configuration.md 
b/website/docs/flink_configuration.md
index ba7853d7cd..d615281a6b 100644
--- a/website/docs/flink_configuration.md
+++ b/website/docs/flink_configuration.md
@@ -60,7 +60,7 @@ allocated with enough memory, we can try to set these memory 
options.
 | `write.bucket_assign.tasks`  |  The parallelism of bucket assigner 
operators. No default value, using Flink `parallelism.default`  | 
[`parallelism.default`](#parallelism) |  Increases the parallelism also 
increases the number of buckets, thus the number of small files (small buckets) 
 |
 | `write.index_boostrap.tasks` |  The parallelism of index bootstrap. 
Increasing parallelism can speed up the efficiency of the bootstrap stage. The 
bootstrap stage will block checkpointing. Therefore, it is necessary to set 
more checkpoint failure tolerance times. Default using Flink 
`parallelism.default` | [`parallelism.default`](#parallelism) | It only take 
effect when `index.bootsrap.enabled` is `true` |
 | `read.tasks` | The parallelism of read operators (batch and stream). Default 
`4`  | `4` |  |
-| `compaction.tasks` | The parallelism of online compaction. Default `4` | `4` 
| `Online compaction` will occupy the resources of the write task. It is 
recommended to use [`offline compaction`](#offline-compaction) |
+| `compaction.tasks` | The parallelism of online compaction. Default `4` | `4` 
| `Online compaction` will occupy the resources of the write task. It is 
recommended to use [`offline 
compaction`](/docs/compaction/#flink-offline-compaction) |
 
 ### Compaction
 
diff --git a/website/docs/hoodie_cleaner.md b/website/docs/hoodie_cleaner.md
index 41956f566c..10f1aa2450 100644
--- a/website/docs/hoodie_cleaner.md
+++ b/website/docs/hoodie_cleaner.md
@@ -47,7 +47,7 @@ hoodie.clean.async=true
 ```
 
 ### CLI
-You can also use [Hudi CLI](https://hudi.apache.org/docs/deployment#cli) to 
run Hoodie Cleaner.
+You can also use [Hudi CLI](/docs/cli) to run Hoodie Cleaner.
 
 CLI provides the below commands for cleaner service:
 - `cleans show`
diff --git a/website/docs/hoodie_deltastreamer.md 
b/website/docs/hoodie_deltastreamer.md
index f212f57859..3c49bd2bbf 100644
--- a/website/docs/hoodie_deltastreamer.md
+++ b/website/docs/hoodie_deltastreamer.md
@@ -374,7 +374,7 @@ frequent `file handle` switching.
 :::note  
 The parallelism of `bulk_insert` is specified by `write.tasks`. The 
parallelism will affect the number of small files.
 In theory, the parallelism of `bulk_insert` is the number of `bucket`s (In 
particular, when each bucket writes to maximum file size, it
-will rollover to the new file handle. Finally, `the number of files` >= 
[`write.bucket_assign.tasks`](#parallelism)).
+will rollover to the new file handle. Finally, `the number of files` >= 
[`write.bucket_assign.tasks`](/docs/configurations#writebucket_assigntasks).
 :::
 
 #### Options
@@ -382,9 +382,9 @@ will rollover to the new file handle. Finally, `the number 
of files` >= [`write.
 |  Option Name  | Required | Default | Remarks |
 |  -----------  | -------  | ------- | ------- |
 | `write.operation` | `true` | `upsert` | Setting as `bulk_insert` to open 
this function  |
-| `write.tasks`  |  `false`  | `4` | The parallelism of `bulk_insert`, `the 
number of files` >= [`write.bucket_assign.tasks`](#parallelism) |
-| `write.bulk_insert.shuffle_by_partition` | `false` | `true` | Whether to 
shuffle data according to the partition field before writing. Enabling this 
option will reduce the number of small files, but there may be a risk of data 
skew  |
-| `write.bulk_insert.sort_by_partition` | `false`  | `true` | Whether to sort 
data according to the partition field before writing. Enabling this option will 
reduce the number of small files when a write task writes multiple partitions  |
+| `write.tasks`  |  `false`  | `4` | The parallelism of `bulk_insert`, `the 
number of files` >= 
[`write.bucket_assign.tasks`](/docs/configurations#writebucket_assigntasks) |
+| `write.bulk_insert.shuffle_by_partition` | `false` | `true` | Whether to 
shuffle data according to the partition field before writing. Enabling this 
option will reduce the number of small files, but there may be a risk of data 
skew |
+| `write.bulk_insert.sort_by_partition` | `false`  | `true` | Whether to sort 
data according to the partition field before writing. Enabling this option will 
reduce the number of small files when a write task writes multiple partitions |
 | `write.sort.memory` | `false` | `128` | Available managed memory of sort 
operator. default  `128` MB |
 
 ### Index Bootstrap
diff --git a/website/docs/key_generation.md b/website/docs/key_generation.md
index f20e4d77a1..1dcb020645 100644
--- a/website/docs/key_generation.md
+++ b/website/docs/key_generation.md
@@ -17,7 +17,7 @@ Hudi provides several key generators out of the box that 
users can use based on
 implementation for users to implement and use their own KeyGenerator. This 
page goes over all different types of key
 generators that are readily available to use.
 
-[Here](https://github.com/apache/hudi/blob/master/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/keygen/KeyGenerator.java)
+[Here](https://github.com/apache/hudi/blob/6f9b02decb5bb2b83709b1b6ec04a97e4d102c11/hudi-common/src/main/java/org/apache/hudi/keygen/KeyGenerator.java)
 is the interface for KeyGenerator in Hudi for your reference.
 
 Before diving into different types of key generators, let’s go over some of 
the common configs required to be set for
diff --git a/website/docs/metrics.md b/website/docs/metrics.md
index 17441447fa..4a831d7981 100644
--- a/website/docs/metrics.md
+++ b/website/docs/metrics.md
@@ -6,7 +6,7 @@ toc: true
 last_modified_at: 2020-06-20T15:59:57-04:00
 ---
 
-In this section, we will introduce the `MetricsReporter` and `HoodieMetrics` 
in Hudi. You can view the metrics-related configurations 
[here](configurations#metrics-configs).
+In this section, we will introduce the `MetricsReporter` and `HoodieMetrics` 
in Hudi. You can view the metrics-related configurations 
[here](configurations#METRICS).
 
 ## MetricsReporter
 
@@ -17,7 +17,7 @@ MetricsReporter provides APIs for reporting `HoodieMetrics` 
to user-specified ba
 JmxMetricsReporter is an implementation of JMX reporter, which used to report 
JMX metrics.
 
 #### Configurations
-The following is an example of `JmxMetricsReporter`. More detaile 
configurations can be referenced [here](configurations#jmx).
+The following is an example of `JmxMetricsReporter`. More detailed 
configurations can be referenced 
[here](configurations#Metrics-Configurations-for-Jmx).
 
   ```properties
   hoodie.metrics.on=true
@@ -37,7 +37,7 @@ As configured above, JmxMetricsReporter will started JMX 
server on port 4001. We
 MetricsGraphiteReporter is an implementation of Graphite reporter, which 
connects to a Graphite server, and send `HoodieMetrics` to it.
 
 #### Configurations
-The following is an example of `MetricsGraphiteReporter`. More detaile 
configurations can be referenced [here](configurations#graphite).
+The following is an example of `MetricsGraphiteReporter`. More detaile 
configurations can be referenced 
[here](configurations#Metrics-Configurations-for-Graphite).
 
   ```properties
   hoodie.metrics.on=true
@@ -58,7 +58,7 @@ DatadogMetricsReporter is an implementation of Datadog 
reporter.
 A reporter which publishes metric values to Datadog monitoring service via 
Datadog HTTP API.
 
 #### Configurations
-The following is an example of `DatadogMetricsReporter`. More detailed 
configurations can be referenced [here](configurations#datadog).
+The following is an example of `DatadogMetricsReporter`. More detailed 
configurations can be referenced 
[here](configurations#Metrics-Configurations-for-Datadog-reporter).
 
 ```properties
 hoodie.metrics.on=true
@@ -138,7 +138,7 @@ tuned are in the `HoodieMetricsCloudWatchConfig` class.
 Allows users to define a custom metrics reporter.
 
 #### Configurations
-The following is an example of `UserDefinedMetricsReporter`. More detailed 
configurations can be referenced [here](configurations#user-defined-reporter).
+The following is an example of `UserDefinedMetricsReporter`. More detailed 
configurations can be referenced [here](configurations#Metrics-Configurations).
 
 ```properties
 hoodie.metrics.on=true
diff --git a/website/docs/performance.md b/website/docs/performance.md
index 53152730bd..db78a7f25b 100644
--- a/website/docs/performance.md
+++ b/website/docs/performance.md
@@ -14,12 +14,12 @@ column statistics etc. Even on some cloud data stores, 
there is often cost to li
 
 Here are some ways to efficiently manage the storage of your Hudi tables.
 
-- The [small file handling 
feature](/docs/configurations#compactionSmallFileSize) in Hudi, profiles 
incoming workload
+- The [small file handling 
feature](/docs/configurations/#hoodieparquetsmallfilelimit) in Hudi, profiles 
incoming workload
   and distributes inserts to existing file groups instead of creating new file 
groups, which can lead to small files.
-- Cleaner can be [configured](/docs/configurations#retainCommits) to clean up 
older file slices, more or less aggressively depending on maximum time for 
queries to run & lookback needed for incremental pull
-- User can also tune the size of the [base/parquet 
file](/docs/configurations#limitFileSize), [log 
files](/docs/configurations#logFileMaxSize) & expected [compression 
ratio](/docs/configurations#parquetCompressionRatio),
+- Cleaner can be 
[configured](/docs/configurations#hoodiecleanercommitsretained) to clean up 
older file slices, more or less aggressively depending on maximum time for 
queries to run & lookback needed for incremental pull
+- User can also tune the size of the [base/parquet 
file](/docs/configurations#hoodieparquetmaxfilesize), [log 
files](/docs/configurations#hoodielogfilemaxsize) & expected [compression 
ratio](/docs/configurations#hoodieparquetcompressionratio),
   such that sufficient number of inserts are grouped into the same file group, 
resulting in well sized base files ultimately.
-- Intelligently tuning the [bulk insert 
parallelism](/docs/configurations#withBulkInsertParallelism), can again in 
nicely sized initial file groups. It is in fact critical to get this right, 
since the file groups
+- Intelligently tuning the [bulk insert 
parallelism](/docs/configurations#hoodiebulkinsertshuffleparallelism), can 
again in nicely sized initial file groups. It is in fact critical to get this 
right, since the file groups
   once created cannot be deleted, but simply expanded as explained before.
 - For workloads with heavy updates, the [merge-on-read 
table](/docs/concepts#merge-on-read-table) provides a nice mechanism for 
ingesting quickly into smaller files and then later merging them into larger 
base files via compaction.
 
diff --git a/website/docs/query_engine_setup.md 
b/website/docs/query_engine_setup.md
index d89a96d042..8d555dae3e 100644
--- a/website/docs/query_engine_setup.md
+++ b/website/docs/query_engine_setup.md
@@ -64,7 +64,7 @@ To query Hudi tables on Trino, please place the 
`hudi-presto-bundle` jar into th
 ## Hive
 
 In order for Hive to recognize Hudi tables and query correctly,
-- the HiveServer2 needs to be provided with the 
`hudi-hadoop-mr-bundle-x.y.z-SNAPSHOT.jar` in its [aux jars 
path](https://www.cloudera.com/documentation/enterprise/5-6-x/topics/cm_mc_hive_udf#concept_nc3_mms_lr).
 This will ensure the input format
+- the HiveServer2 needs to be provided with the 
`hudi-hadoop-mr-bundle-x.y.z-SNAPSHOT.jar` in its [aux jars 
path](https://www.cloudera.com/documentation/enterprise/5-6-x/topics/cm_mc_hive_udf.html#concept_nc3_mms_lr).
 This will ensure the input format
   classes with its dependencies are available for query planning & execution.
 - For MERGE_ON_READ tables, additionally the bundle needs to be put on the 
hadoop/hive installation across the cluster, so that queries can pick up the 
custom RecordReader as well.
 
diff --git a/website/docs/querying_data.md b/website/docs/querying_data.md
index c516708e7d..1b5cee0d5b 100644
--- a/website/docs/querying_data.md
+++ b/website/docs/querying_data.md
@@ -49,7 +49,7 @@ spark.sql("select `_hoodie_commit_time`, fare, begin_lon, 
begin_lat, ts from  hu
 ```
 
 For examples, refer to [Incremental 
Queries](/docs/quick-start-guide#incremental-query) in the Spark quickstart. 
-Please refer to [configurations](/docs/configurations#spark-datasource) 
section, to view all datasource options.
+Please refer to [configurations](/docs/configurations#SPARK_DATASOURCE) 
section, to view all datasource options.
 
 Additionally, `HoodieReadClient` offers the following functionality using 
Hudi's implicit indexing.
 
@@ -170,16 +170,16 @@ would ensure Map Reduce execution is chosen for a Hive 
query, which combines par
 separated) and calls InputFormat.listStatus() only once with all those 
partitions.
 
 ## PrestoDB
-To setup PrestoDB for querying Hudi, see the [Query Engine 
Setup](/docs/query_engine_setup#PrestoDB) page.
+To setup PrestoDB for querying Hudi, see the [Query Engine 
Setup](/docs/query_engine_setup#prestodb) page.
 
 ## Trino
-To setup Trino for querying Hudi, see the [Query Engine 
Setup](/docs/query_engine_setup#Trino) page.
+To setup Trino for querying Hudi, see the [Query Engine 
Setup](/docs/query_engine_setup#trino) page.
 
 ## Impala (3.4 or later)
 
 ### Snapshot Query
 
-Impala is able to query Hudi Copy-on-write table as an [EXTERNAL 
TABLE](https://docs.cloudera.com/documentation/enterprise/6/6.3/topics/impala_tables#external_tables)
 on HDFS.  
+Impala is able to query Hudi Copy-on-write table as an [EXTERNAL 
TABLE](https://docs.cloudera.com/documentation/enterprise/6/6.3/topics/impala_tables.html#external_tables)
 on HDFS.  
 
 To create a Hudi read optimized table on Impala:
 ```
diff --git a/website/docs/quick-start-guide.md 
b/website/docs/quick-start-guide.md
index 6446016254..51d0b838f7 100644
--- a/website/docs/quick-start-guide.md
+++ b/website/docs/quick-start-guide.md
@@ -412,12 +412,12 @@ df.write.format("hudi").
 :::info
 `mode(Overwrite)` overwrites and recreates the table if it already exists.
 You can check the data generated under 
`/tmp/hudi_trips_cow/<region>/<country>/<city>/`. We provided a record key
-(`uuid` in 
[schema](https://github.com/apache/hudi/blob/master/hudi-spark/src/main/java/org/apache/hudi/QuickstartUtils.java#L58)),
 partition field (`region/country/city`) and combine logic (`ts` in
-[schema](https://github.com/apache/hudi/blob/master/hudi-spark/src/main/java/org/apache/hudi/QuickstartUtils.java#L58))
 to ensure trip records are unique within each partition. For more info, refer 
to
-[Modeling data stored in 
Hudi](https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=113709185#FAQ-HowdoImodelthedatastoredinHudi)
+(`uuid` in 
[schema](https://github.com/apache/hudi/blob/2e6e302efec2fa848ded4f88a95540ad2adb7798/hudi-spark-datasource/hudi-spark/src/main/java/org/apache/hudi/QuickstartUtils.java#L60)),
 partition field (`region/country/city`) and combine logic (`ts` in
+[schema](https://github.com/apache/hudi/blob/2e6e302efec2fa848ded4f88a95540ad2adb7798/hudi-spark-datasource/hudi-spark/src/main/java/org/apache/hudi/QuickstartUtils.java#L60))
 to ensure trip records are unique within each partition. For more info, refer 
to
+[Modeling data stored in 
Hudi](https://hudi.apache.org/learn/faq/#how-do-i-model-the-data-stored-in-hudi)
 and for info on ways to ingest data into Hudi, refer to [Writing Hudi 
Tables](/docs/writing_data).
 Here we are using the default write operation : `upsert`. If you have a 
workload without updates, you can also issue
-`insert` or `bulk_insert` operations which could be faster. To know more, 
refer to [Write operations](/docs/writing_data#write-operations)
+`insert` or `bulk_insert` operations which could be faster. To know more, 
refer to [Write operations](/docs/write_operations)
 :::
 </TabItem>
 
@@ -453,7 +453,7 @@ You can check the data generated under 
`/tmp/hudi_trips_cow/<region>/<country>/<
 [Modeling data stored in 
Hudi](https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=113709185#FAQ-HowdoImodelthedatastoredinHudi)
 and for info on ways to ingest data into Hudi, refer to [Writing Hudi 
Tables](/docs/writing_data).
 Here we are using the default write operation : `upsert`. If you have a 
workload without updates, you can also issue
-`insert` or `bulk_insert` operations which could be faster. To know more, 
refer to [Write operations](/docs/writing_data#write-operations)
+`insert` or `bulk_insert` operations which could be faster. To know more, 
refer to [Write operations](/docs/write_operations)
 :::
 </TabItem>
 
@@ -1117,7 +1117,7 @@ more details please refer to 
[procedures](/docs/next/procedures).
 
 You can also do the quickstart by [building hudi 
yourself](https://github.com/apache/hudi#building-apache-hudi-from-source), 
 and using `--jars <path to 
hudi_code>/packaging/hudi-spark-bundle/target/hudi-spark-bundle_2.1?-*.*.*-SNAPSHOT.jar`
 in the spark-shell command above
-instead of `--packages org.apache.hudi:hudi-spark3.1.2-bundle_2.12:0.10.1`. 
Hudi also supports scala 2.12. Refer [build with scala 
2.12](https://github.com/apache/hudi#build-with-scala-212)
+instead of `--packages org.apache.hudi:hudi-spark3.1.2-bundle_2.12:0.10.1`. 
Hudi also supports scala 2.12. Refer [build with scala 
2.12](https://github.com/apache/hudi#build-with-different-spark-versions)
 for more info.
 
 Also, we used Spark here to show case the capabilities of Hudi. However, Hudi 
can support multiple table types/query types and 
diff --git a/website/docs/use_cases.md b/website/docs/use_cases.md
index 3758d7208e..f3fabdf04d 100644
--- a/website/docs/use_cases.md
+++ b/website/docs/use_cases.md
@@ -15,7 +15,7 @@ This blog post outlines this use case in more depth - 
https://hudi.apache.org/bl
 
 ### Near Real-Time Ingestion
 
-Ingesting data from OLTP sources like (event logs, databases, external 
sources) into a [Data Lake](http://martinfowler.com/bliki/DataLake) is a common 
problem,
+Ingesting data from OLTP sources like (event logs, databases, external 
sources) into a [Data Lake](http://martinfowler.com/bliki/DataLake.html) is a 
common problem,
 that is unfortunately solved in a piecemeal fashion, using a medley of 
ingestion tools. This "raw data" layer of the data lake often forms the bedrock 
on which
 more value is created.
 
@@ -27,7 +27,7 @@ even moderately big installations store billions of rows. It 
goes without saying
 are needed if ingestion is to keep up with the typically high update volumes.
 
 Even for immutable data sources like [Kafka](https://kafka.apache.org), there 
is often a need to de-duplicate the incoming events against what's stored on 
DFS.
-Hudi achieves this by [employing 
indexes](http://hudi.apache.org/blog/hudi-indexing-mechanisms/) of different 
kinds, quickly and efficiently.
+Hudi achieves this by [employing 
indexes](http://hudi.apache.org/blog/2020/11/11/hudi-indexing-mechanisms/) of 
different kinds, quickly and efficiently.
 
 All of this is seamlessly achieved by the Hudi DeltaStreamer tool, which is 
maintained in tight integration with rest of the code 
 and we are always trying to add more capture sources, to make this easier for 
the users. The tool also has a continuous mode, where it
diff --git a/website/docs/write_operations.md b/website/docs/write_operations.md
index ccdac23350..746a93d057 100644
--- a/website/docs/write_operations.md
+++ b/website/docs/write_operations.md
@@ -37,7 +37,7 @@ Hudi supports implementing two types of deletes on data 
stored in Hudi tables, b
 ## Writing path
 The following is an inside look on the Hudi write path and the sequence of 
events that occur during a write.
 
-1. [Deduping](/docs/configurations/#writeinsertdeduplicate)
+1. [Deduping](/docs/configurations#hoodiecombinebeforeinsert)
    1. First your input records may have duplicate keys within the same batch 
and duplicates need to be combined or reduced by key.
 2. [Index Lookup](/docs/next/indexing)
    1. Next, an index lookup is performed to try and match the input records to 
identify which file groups they belong to.
diff --git a/website/docs/writing_data.md b/website/docs/writing_data.md
index 15fcc4d66b..8765222b21 100644
--- a/website/docs/writing_data.md
+++ b/website/docs/writing_data.md
@@ -9,7 +9,7 @@ import Tabs from '@theme/Tabs';
 import TabItem from '@theme/TabItem';
 
 In this section, we will cover ways to ingest new changes from external 
sources or even other Hudi tables.
-The two main tools available are the [DeltaStreamer](#deltastreamer) tool, as 
well as the [Spark Hudi datasource](#datasource-writer).
+The two main tools available are the 
[DeltaStreamer](/docs/hoodie_deltastreamer#deltastreamer) tool, as well as the 
[Spark Hudi datasource](#spark-datasource-writer).
 
 ## Spark Datasource Writer
 
@@ -31,7 +31,7 @@ Default value: `"partitionpath"`<br/>
 **PRECOMBINE_FIELD_OPT_KEY** (Required): When two records within the same 
batch have the same key value, the record with the largest value from the field 
specified will be choosen. If you are using default payload of 
OverwriteWithLatestAvroPayload for HoodieRecordPayload (`WRITE_PAYLOAD_CLASS`), 
an incoming record will always takes precendence compared to the one in storage 
ignoring this `PRECOMBINE_FIELD_OPT_KEY`. <br/>
 Default value: `"ts"`<br/>
 
-**OPERATION_OPT_KEY**: The [write operations](#write-operations) to use.<br/>
+**OPERATION_OPT_KEY**: The [write operations](/docs/write_operations) to 
use.<br/>
 Available values:<br/>
 `UPSERT_OPERATION_OPT_VAL` (default), `BULK_INSERT_OPERATION_OPT_VAL`, 
`INSERT_OPERATION_OPT_VAL`, `DELETE_OPERATION_OPT_VAL`
 
@@ -39,7 +39,7 @@ Available values:<br/>
 Available values:<br/>
 [`COW_TABLE_TYPE_OPT_VAL`](/docs/concepts#copy-on-write-table) (default), 
[`MOR_TABLE_TYPE_OPT_VAL`](/docs/concepts#merge-on-read-table)
 
-**KEYGENERATOR_CLASS_OPT_KEY**: Refer to [Key Generation](#key-generation) 
section below.
+**KEYGENERATOR_CLASS_OPT_KEY**: Refer to [Key 
Generation](/docs/key_generation) section below.
 
 **HIVE_PARTITION_EXTRACTOR_CLASS_OPT_KEY**: If using hive, specify if the 
table should or should not be partitioned.<br/>
 Available values:<br/>
@@ -88,12 +88,12 @@ df.write.format("hudi").
 :::info
 `mode(Overwrite)` overwrites and recreates the table if it already exists.
 You can check the data generated under 
`/tmp/hudi_trips_cow/<region>/<country>/<city>/`. We provided a record key
-(`uuid` in 
[schema](https://github.com/apache/hudi/blob/master/hudi-spark/src/main/java/org/apache/hudi/QuickstartUtils.java#L58)),
 partition field (`region/country/city`) and combine logic (`ts` in
-[schema](https://github.com/apache/hudi/blob/master/hudi-spark/src/main/java/org/apache/hudi/QuickstartUtils.java#L58))
 to ensure trip records are unique within each partition. For more info, refer 
to
-[Modeling data stored in 
Hudi](https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=113709185#FAQ-HowdoImodelthedatastoredinHudi)
+(`uuid` in 
[schema](https://github.com/apache/hudi/blob/6f9b02decb5bb2b83709b1b6ec04a97e4d102c11/hudi-spark-datasource/hudi-spark/src/main/java/org/apache/hudi/QuickstartUtils.java#L60)),
 partition field (`region/country/city`) and combine logic (`ts` in
+[schema](https://github.com/apache/hudi/blob/6f9b02decb5bb2b83709b1b6ec04a97e4d102c11/hudi-spark-datasource/hudi-spark/src/main/java/org/apache/hudi/QuickstartUtils.java#L60))
 to ensure trip records are unique within each partition. For more info, refer 
to
+[Modeling data stored in 
Hudi](https://hudi.apache.org/learn/faq/#how-do-i-model-the-data-stored-in-hudi)
 and for info on ways to ingest data into Hudi, refer to [Writing Hudi 
Tables](/docs/writing_data).
 Here we are using the default write operation : `upsert`. If you have a 
workload without updates, you can also issue
-`insert` or `bulk_insert` operations which could be faster. To know more, 
refer to [Write operations](/docs/writing_data#write-operations)
+`insert` or `bulk_insert` operations which could be faster. To know more, 
refer to [Write operations](/docs/write_operations)
 :::
 </TabItem>
 
@@ -124,12 +124,12 @@ df.write.format("hudi").
 :::info
 `mode(Overwrite)` overwrites and recreates the table if it already exists.
 You can check the data generated under 
`/tmp/hudi_trips_cow/<region>/<country>/<city>/`. We provided a record key
-(`uuid` in 
[schema](https://github.com/apache/hudi/blob/master/hudi-spark/src/main/java/org/apache/hudi/QuickstartUtils.java#L58)),
 partition field (`region/country/city`) and combine logic (`ts` in
-[schema](https://github.com/apache/hudi/blob/master/hudi-spark/src/main/java/org/apache/hudi/QuickstartUtils.java#L58))
 to ensure trip records are unique within each partition. For more info, refer 
to
-[Modeling data stored in 
Hudi](https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=113709185#FAQ-HowdoImodelthedatastoredinHudi)
+(`uuid` in 
[schema](https://github.com/apache/hudi/blob/2e6e302efec2fa848ded4f88a95540ad2adb7798/hudi-spark-datasource/hudi-spark/src/main/java/org/apache/hudi/QuickstartUtils.java#L60)),
 partition field (`region/country/city`) and combine logic (`ts` in
+[schema](https://github.com/apache/hudi/blob/2e6e302efec2fa848ded4f88a95540ad2adb7798/hudi-spark-datasource/hudi-spark/src/main/java/org/apache/hudi/QuickstartUtils.java#L60))
 to ensure trip records are unique within each partition. For more info, refer 
to
+[Modeling data stored in 
Hudi](https://hudi.apache.org/learn/faq/#how-do-i-model-the-data-stored-in-hudi)
 and for info on ways to ingest data into Hudi, refer to [Writing Hudi 
Tables](/docs/writing_data).
 Here we are using the default write operation : `upsert`. If you have a 
workload without updates, you can also issue
-`insert` or `bulk_insert` operations which could be faster. To know more, 
refer to [Write operations](/docs/writing_data#write-operations)
+`insert` or `bulk_insert` operations which could be faster. To know more, 
refer to [Write operations](/docs/write_operations)
 :::
 </TabItem>

[hudi] branch asf-site updated: [DOCS] update broken links (#5333)

Reply via email to