[hudi] branch asf-site updated: [HUDI-6552][DOCS] Restructure and reorganize FAQ (#9231)

sivabalan Wed, 19 Jul 2023 12:12:05 -0700

This is an automated email from the ASF dual-hosted git repository.

sivabalan pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/hudi.git



The following commit(s) were added to refs/heads/asf-site by this push:
     new 9594aad72c9 [HUDI-6552][DOCS] Restructure and reorganize FAQ (#9231)
9594aad72c9 is described below

commit 9594aad72c9cb130a99fb642ab0377878931ab36
Author: Bhavani Sudha Saktheeswaran <[email protected]>
AuthorDate: Wed Jul 19 12:11:33 2023 -0700

    [HUDI-6552][DOCS] Restructure and reorganize FAQ (#9231)
    
    - Add more questions in FAQ
    - Split all questions in FAQ to these pages - faq,troubeshooting and tuning
    - Organize questions based on component within each page
---
 website/docs/faq.md             | 662 +++++++++++++++++++---------------------
 website/docs/querying_data.md   |   2 +-
 website/docs/troubleshooting.md | 229 +++++++-------
 website/docs/tuning-guide.md    |  71 ++++-
 website/src/pages/tech-specs.md |   6 +-
 5 files changed, 512 insertions(+), 458 deletions(-)

diff --git a/website/docs/faq.md b/website/docs/faq.md
index d4801ae10f6..f27dcd52a77 100644
--- a/website/docs/faq.md
+++ b/website/docs/faq.md
@@ -8,28 +8,29 @@ last_modified_at: 2021-08-18T15:59:57-04:00
 ## General
 
 ### When is Hudi useful for me or my organization?
-   
+
 If you are looking to quickly ingest data onto HDFS or cloud storage, Hudi can 
provide you tools to [help](https://hudi.apache.org/docs/writing_data/). Also, 
if you have ETL/hive/spark jobs which are slow/taking up a lot of resources, 
Hudi can potentially help by providing an incremental approach to reading and 
writing data.
 
 As an organization, Hudi can help you build an [efficient data 
lake](https://docs.google.com/presentation/d/1FHhsvh70ZP6xXlHdVsAI0g__B_6Mpto5KQFlZ0b8-mM/edit#slide=id.p),
 solving some of the most complex, low-level storage management problems, while 
putting data into hands of your data analysts, engineers and scientists much 
quicker.
 
 ### What are some non-goals for Hudi?
 
-Hudi is not designed for any OLTP use-cases, where typically you are using 
existing NoSQL/RDBMS data stores. Hudi cannot replace your in-memory analytical 
database (at-least not yet!). Hudi support near-real time ingestion in the 
order of few minutes, trading off latency for efficient batching. If you truly 
desirable sub-minute processing delays, then stick with your favorite stream 
processing solution. 
+Hudi is not designed for any OLTP use-cases, where typically you are using 
existing NoSQL/RDBMS data stores. Hudi cannot replace your in-memory analytical 
database (at-least not yet!). Hudi support near-real time ingestion in the 
order of few minutes, trading off latency for efficient batching. If you truly 
desirable sub-minute processing delays, then stick with your favorite stream 
processing solution.
 
 ### What is incremental processing? Why does Hudi docs/talks keep talking 
about it?
 
 Incremental processing was first introduced by Vinoth Chandar, in the O'reilly 
[blog](https://www.oreilly.com/content/ubers-case-for-incremental-processing-on-hadoop/),
 that set off most of this effort. In purely technical terms, incremental 
processing merely refers to writing mini-batch programs in streaming processing 
style. Typical batch jobs consume **all input** and recompute **all output**, 
every few hours. Typical stream processing jobs consume some **new input** and 
recompute **n [...]
 
-While we can merely refer to this as stream processing, we call it 
*incremental processing*, to distinguish from purely stream processing 
pipelines built using Apache Flink, Apache Apex or Apache Kafka Streams.
+While we can merely refer to this as stream processing, we call it 
_incremental processing_, to distinguish from purely stream processing 
pipelines built using Apache Flink or Apache Kafka Streams.
 
-### What is the difference between copy-on-write (COW) vs merge-on-read (MOR) 
storage types?
+### How is Hudi optimized for CDC and streaming use cases?
 
-**Copy On Write** - This storage type enables clients to ingest data on 
columnar file formats, currently parquet. Any new data that is written to the 
Hudi dataset using COW storage type, will write new parquet files. Updating an 
existing set of rows will result in a rewrite of the entire parquet files that 
collectively contain the affected rows being updated. Hence, all writes to such 
datasets are limited by parquet writing performance, the larger the parquet 
file, the higher is the time [...]
+One of the core use-cases for Apache Hudi is enabling seamless, efficient 
database ingestion to your lake, and change data capture is a direct 
application of that. Hudi’s core design primitives support fast upserts and 
deletes of data that are suitable for CDC and streaming use cases. Here is a 
glimpse of some of the challenges accompanying streaming and cdc workloads that 
Hudi handles efficiently out of the box.
 
-**Merge On Read** - This storage type enables clients to  ingest data quickly 
onto row based data format such as avro. Any new data that is written to the 
Hudi dataset using MOR table type, will write new log/delta files that 
internally store the data as avro encoded bytes. A compaction process 
(configured as inline or asynchronous) will convert log file format to columnar 
file format (parquet). Two different InputFormats expose 2 different views of 
this data, Read Optimized view exposes [...]
-
-More details can be found [here](https://hudi.apache.org/docs/concepts/) and 
also [Design And 
Architecture](https://cwiki.apache.org/confluence/display/HUDI/Design+And+Architecture).
+*   **_Processing of deletes:_** Deletes are treated no differently than 
updates and are logged with the same filegroups where the corresponding keys 
exist. This helps process deletes faster same like regular inserts and updates 
and Hudi processes deletes at file group level using compaction in MOR tables. 
This can be very expensive in other open source systems that store deletes as 
separate files than data files and incur N(Data files)\*N(Delete files) merge 
cost to process deletes ever [...]
+*   **_Operational overhead of merging deletes at scale:_** When deletes are 
stored as separate files without any notion of data locality, the merging of 
data and deletes can become a run away job that cannot complete in time due to 
various reasons (Spark retries, executor failure, OOM, etc.). As more data 
files and delete files are added, the merge becomes even more expensive and 
complex later on, making it hard to manage in practice causing operation 
overhead. Hudi removes this complex [...]
+*   **_File sizing with updates:_** Other open source systems, process updates 
by generating new data files for inserting the new records after deletion, 
where both data files and delete files get introduced for every batch of 
updates. This yields to small file problem and requires file sizing. Whereas, 
Hudi embraces mutations to the data, and manages the table automatically by 
keeping file sizes in check without passing the burden of file sizing to users 
as manual maintenance.
+*   **_Support for partial updates and payload ordering:_** Hudi support 
partial updates where already existing record can be updated for specific 
fields that are non null from newer records (with newer timestamps). Similarly, 
Hudi supports payload ordering with timestamp through specific payload 
implementation where late-arriving data with older timestamps will be ignored 
or dropped. Users can even implement custom logic and plug in to handle what 
they want.
 
 ### How do I choose a storage type for my workload?
 
@@ -37,21 +38,21 @@ A key goal of Hudi is to provide **upsert functionality** 
that is orders of magn
 
 Choose Copy-on-write storage if :
 
- - You are looking for a simple alternative, that replaces your existing 
parquet tables without any need for real-time data.
- - Your current job is rewriting entire table/partition to deal with updates, 
while only a few files actually change in each partition.
- - You are happy keeping things operationally simpler (no compaction etc), 
with the ingestion/write performance bound by the [parquet file 
size](https://hudi.apache.org/docs/configurations#hoodieparquetmaxfilesize) and 
the number of such files affected/dirtied by updates
- - Your workload is fairly well-understood and does not have sudden bursts of 
large amount of update or inserts to older partitions. COW absorbs all the 
merging cost on the writer side and thus these sudden changes can clog up your 
ingestion and interfere with meeting normal mode ingest latency targets.
+*   You are looking for a simple alternative, that replaces your existing 
parquet tables without any need for real-time data.
+*   Your current job is rewriting entire table/partition to deal with updates, 
while only a few files actually change in each partition.
+*   You are happy keeping things operationally simpler (no compaction etc), 
with the ingestion/write performance bound by the [parquet file 
size](https://hudi.apache.org/docs/configurations#hoodieparquetmaxfilesize) and 
the number of such files affected/dirtied by updates
+*   Your workload is fairly well-understood and does not have sudden bursts of 
large amount of update or inserts to older partitions. COW absorbs all the 
merging cost on the writer side and thus these sudden changes can clog up your 
ingestion and interfere with meeting normal mode ingest latency targets.
 
 Choose merge-on-read storage if :
 
- - You want the data to be ingested as quickly & queryable as much as possible.
- - Your workload can have sudden spikes/changes in pattern (e.g bulk updates 
to older transactions in upstream database causing lots of updates to old 
partitions on DFS). Asynchronous compaction helps amortize the write 
amplification caused by such scenarios, while normal ingestion keeps up with 
incoming stream of changes.
+*   You want the data to be ingested as quickly & queryable as much as 
possible.
+*   Your workload can have sudden spikes/changes in pattern (e.g bulk updates 
to older transactions in upstream database causing lots of updates to old 
partitions on DFS). Asynchronous compaction helps amortize the write 
amplification caused by such scenarios, while normal ingestion keeps up with 
incoming stream of changes.
 
 Immaterial of what you choose, Hudi provides
 
- - Snapshot isolation and atomic write of batch of records
- - Incremental pulls
- - Ability to de-duplicate data
+*   Snapshot isolation and atomic write of batch of records
+*   Incremental pulls
+*   Ability to de-duplicate data
 
 Find more [here](https://hudi.apache.org/docs/concepts/).
 
@@ -63,89 +64,89 @@ Nonetheless, Hudi is designed very much like a database and 
provides similar fun
 
 ### How do I model the data stored in Hudi?
 
-When writing data into Hudi, you model the records like how you would on a 
key-value store - specify a key field (unique for a single partition/across 
dataset), a partition field (denotes partition to place key into) and 
preCombine/combine logic that specifies how to handle duplicates in a batch of 
records written. This model enables Hudi to enforce primary key constraints 
like you would get on a database table. See 
[here](https://hudi.apache.org/docs/writing_data/) for an example.
+When writing data into Hudi, you model the records like how you would on a 
key-value store - specify a key field (unique for a single partition/across 
table), a partition field (denotes partition to place key into) and 
preCombine/combine logic that specifies how to handle duplicates in a batch of 
records written. This model enables Hudi to enforce primary key constraints 
like you would get on a database table. See 
[here](https://hudi.apache.org/docs/writing_data/) for an example.
 
-When querying/reading data, Hudi just presents itself as a json-like 
hierarchical table, everyone is used to querying using Hive/Spark/Presto over 
Parquet/Json/Avro. 
+When querying/reading data, Hudi just presents itself as a json-like 
hierarchical table, everyone is used to querying using Hive/Spark/Presto over 
Parquet/Json/Avro.
 
 ### Why does Hudi require a key field to be configured?
 
-Hudi was designed to support fast record level Upserts and thus requires a key 
to identify whether an incoming record is 
-an insert or update or delete, and process accordingly. Additionally, Hudi 
automatically maintains indexes on this primary 
-key and for many use-cases like CDC, ensuring such primary key constraints is 
crucial to ensure data quality. In this context, 
-pre combine key helps reconcile multiple records with same key in a single 
batch of input records. Even for append-only data 
-streams, Hudi supports key based de-duplication before inserting records. For 
e-g; you may have atleast once data integration 
-systems like Kafka MirrorMaker that can introduce duplicates during failures. 
Even for plain old batch pipelines, keys 
-help eliminate duplication that could be caused by backfill pipelines, where 
commonly it's unclear what set of records 
+Hudi was designed to support fast record level Upserts and thus requires a key 
to identify whether an incoming record is
+an insert or update or delete, and process accordingly. Additionally, Hudi 
automatically maintains indexes on this primary
+key and for many use-cases like CDC, ensuring such primary key constraints is 
crucial to ensure data quality. In this context,
+pre combine key helps reconcile multiple records with same key in a single 
batch of input records. Even for append-only data
+streams, Hudi supports key based de-duplication before inserting records. For 
e-g; you may have atleast once data integration
+systems like Kafka MirrorMaker that can introduce duplicates during failures. 
Even for plain old batch pipelines, keys
+help eliminate duplication that could be caused by backfill pipelines, where 
commonly it's unclear what set of records
 need to be re-written. We are actively working on making keys easier by only 
requiring them for Upsert and/or automatically
-generate the key internally (much like RDBMS row_ids)
-
-### Does Hudi support cloud storage/object stores?
-
-Yes. Generally speaking, Hudi is able to provide its functionality on any 
Hadoop FileSystem implementation and thus can read and write datasets on [Cloud 
stores](https://hudi.apache.org/docs/cloud) (Amazon S3 or Microsoft Azure or 
Google Cloud Storage). Over time, Hudi has also incorporated specific design 
aspects that make building Hudi datasets on the cloud easy, such as 
[consistency checks for 
s3](https://hudi.apache.org/docs/configurations#hoodieconsistencycheckenabled), 
Zero moves/r [...]
-
-### What versions of Hive/Spark/Hadoop are support by Hudi?
-
-As of September 2019, Hudi can support Spark 2.1+, Hive 2.x, Hadoop 2.7+ (not 
Hadoop 3).
+generate the key internally (much like RDBMS row\_ids)
 
-### How does Hudi actually store data inside a dataset?
+### How does Hudi actually store data inside a table?
 
-At a high level, Hudi is based on MVCC design that writes data to versioned 
parquet/base files and log files that contain changes to the base file. All the 
files are stored under a partitioning scheme for the dataset, which closely 
resembles how Apache Hive tables are laid out on DFS. Please refer 
[here](https://hudi.apache.org/docs/concepts/) for more details.
+At a high level, Hudi is based on MVCC design that writes data to versioned 
parquet/base files and log files that contain changes to the base file. All the 
files are stored under a partitioning scheme for the table, which closely 
resembles how Apache Hive tables are laid out on DFS. Please refer 
[here](https://hudi.apache.org/docs/concepts/) for more details.
 
 ### How Hudi handles partition evolution requirements ?
-Hudi recommends keeping coarse grained top level partition paths e.g date(ts) 
and within each such partition do clustering in a flexible way to z-order, sort 
data based on interested columns. This provides excellent performance by  : 
minimzing the number of files in each partition, while still packing data that 
will be queried together physically closer (what partitioning aims to achieve).
 
-Let's take an example of a table, where we store log_events with two fields 
`ts` (time at which event was produced) and `cust_id` (user for which event was 
produced) and a common option is to partition by both date(ts) and cust_id.
+Hudi recommends keeping coarse grained top level partition paths e.g date(ts) 
and within each such partition do clustering in a flexible way to z-order, sort 
data based on interested columns. This provides excellent performance by : 
minimzing the number of files in each partition, while still packing data that 
will be queried together physically closer (what partitioning aims to achieve).
+
+Let's take an example of a table, where we store log\_events with two fields 
`ts` (time at which event was produced) and `cust_id` (user for which event was 
produced) and a common option is to partition by both date(ts) and cust\_id.
 Some users may want to start granular with hour(ts) and then later evolve to 
new partitioning scheme say date(ts). But this means, the number of partitions 
in the table could be very high - 365 days x 1K customers = at-least 365K 
potentially small parquet files, that can significantly slow down queries, 
facing throttling issues on the actual S3/DFS reads.
 
-For the afore mentioned reasons, we don't recommend mixing different 
partitioning schemes within the same table, since it adds operational 
complexity, and unpredictable performance. 
+For the afore mentioned reasons, we don't recommend mixing different 
partitioning schemes within the same table, since it adds operational 
complexity, and unpredictable performance.
 Old data stays in old partitions and only new data gets into newer evolved 
partitions. If you want to tidy up the table, one has to rewrite all 
partition/data anwyay! This is where we suggest start with coarse grained 
partitions
 and lean on clustering techniques to optimize for query performance.
 
 We find that most datasets have at-least one high fidelity field, that can be 
used as a coarse partition. Clustering strategies in Hudi provide a lot of 
power - you can alter which partitions to cluster, and which fields to cluster 
each by etc.
-Unlike Hive partitioning, Hudi does not remove the partition field from the 
data files i.e if you write new partition paths, it does not mean old 
partitions need to be rewritten. 
-Partitioning by itself is a relic of the Hive era; Hudi is working on 
replacing partitioning with database like indexing schemes/functions, 
+Unlike Hive partitioning, Hudi does not remove the partition field from the 
data files i.e if you write new partition paths, it does not mean old 
partitions need to be rewritten.
+Partitioning by itself is a relic of the Hive era; Hudi is working on 
replacing partitioning with database like indexing schemes/functions,
 for even more flexibility and get away from Hive-style partition evol route.
 
+## Concepts
 
-## Using Hudi
+### How does Hudi ensure atomicity?
 
-### What are some ways to write a Hudi dataset?
+Hudi writers atomically move an inflight write operation to a "completed" 
state by writing an object/file to the 
[timeline](https://hudi.apache.org/docs/next/timeline) folder, identifying the 
write operation with an instant time that denotes the time the action is deemed 
to have occurred. This is achieved on the underlying DFS (in the case of 
S3/Cloud Storage, by an atomic PUT operation) and can be observed by files of 
the pattern `<instant>.<action>.<state>` in Hudi’s timeline.
 
-Typically, you obtain a set of partial updates/inserts from your source and 
issue [write operations](https://hudi.apache.org/docs/write_operations/) 
against a Hudi dataset.  If you ingesting data from any of the standard sources 
like Kafka, or tailing DFS, the [Hudi 
Streamer](https://hudi.apache.org/docs/hoodie_deltastreamer#hudi-streamer) tool 
is invaluable and provides an easy, self-managed solution to getting data 
written into Hudi. You can also write your own code to capture data fro [...]
+### Does Hudi extend the Hive table layout?
 
-### How is a Hudi job deployed?
+Hudi is very different from Hive in important aspects described below. 
However, based on practical considerations, it chooses to be compatible with 
Hive table layout by adopting partitioning, schema evolution and being 
queryable through Hive query engine. Here are the key aspect where Hudi differs:
 
-The nice thing about Hudi writing is that it just runs like any other spark 
job would on a YARN/Mesos or even a K8S cluster. So you could simply use the 
Spark UI to get visibility into write operations.
+*   Unlike Hive, Hudi does not remove the partition columns from the data 
files. Hudi in fact adds record level [meta 
fields](https://hudi.apache.org/tech-specs#meta-fields) including instant time, 
primary record key, and partition path to the data to support efficient upserts 
and [incremental 
queries/ETL](https://hudi.apache.org/learn/use_cases/#incremental-processing-pipelines).
  Hudi tables can be non-partitioned and the Hudi metadata table adds rich 
indexes on Hudi tables which are b [...]
+*   Hive advocates partitioning as the main remedy for most performance-based 
issues. Features like partition evolution and hidden partitioning are primarily 
based on this Hive based principle of partitioning and aim to tackle the 
metadata problem partially.  Whereas, Hudi biases to coarse-grained 
partitioning and emphasizes 
[clustering](https://hudi.apache.org/docs/clustering) for more fine-grained 
partitioning. Further, users can strategize and evolve the clustering 
asynchronously whic [...]
+*   Hudi considers partition evolution as an anti-pattern and avoids such 
schemes due to the inconsistent performance of queries that goes to depend on 
which part of the table is being queried. Hudi’s design favors consistent 
performance and is aware of the need to redesign to partitioning/tables to 
achieve the same.
 
-### How can I now query the Hudi dataset I just wrote?
+### What concurrency control approaches does Hudi adopt?
 
-Unless Hive sync is enabled, the dataset written by Hudi using one of the 
methods above can simply be queries via the Spark datasource like any other 
source. 
+Hudi provides snapshot isolation between all three types of processes - 
writers, readers, and table services, meaning they all operate on a consistent 
snapshot of the table. Hudi provides optimistic concurrency control (OCC) 
between writers, while providing lock-free, non-blocking MVCC-based concurrency 
control between writers and table-services and between different table 
services. Widely accepted database literature like “[Architecture of a database 
system, pg 81](https://dsf.berkeley. [...]
 
-```scala
-val hoodieROView = spark.read.format("org.apache.hudi").load(basePath + 
"/path/to/partitions/*")
-val hoodieIncViewDF = spark.read().format("org.apache.hudi")
-     .option(DataSourceReadOptions.VIEW_TYPE_OPT_KEY(), 
DataSourceReadOptions.VIEW_TYPE_INCREMENTAL_OPT_VAL())
-     .option(DataSourceReadOptions.BEGIN_INSTANTTIME_OPT_KEY(), 
<beginInstantTime>)
-     .load(basePath);
-```
+### Hudi’s commits are based on transaction start time instead of completed 
time. Does this cause data loss or inconsistency in case of incremental and 
time travel queries?
 
-```java
-Limitations:
+Let’s take a closer look at the scenario here: two commits C1 and C2 (with C2 
starting later than C1) start with a later commit (C2)  finishing first leaving 
the inflight transaction of the earlier commit (C1) before the completed write 
of the later transaction (C2) in Hudi’s timeline. This is not an uncommon 
scenario, especially with various ingestions needs such as backfilling, 
deleting, bootstrapping, etc alongside regular writes. When/Whether the first 
job would commit will depend on [...]
 
-Note that currently the reading realtime view natively out of the Spark 
datasource is not supported. Please use the Hive path below
-```
+In these scenarios, it might be tempting to think of data inconsistencies/data 
loss when using Hudi’s incremental queries. However, Hudi takes special 
handling in incremental queries to ensure that no data is served beyond the 
point Hudi sees an inflight instant in its timeline, so no data loss or drop 
happens. In this case, on seeing C1’s inflight commit (publish to timeline is 
atomic), C2 data (which is > C1 in the timeline) is not served until C1 
inflight transitions to a terminal sta [...]
 
-if Hive Sync is enabled in the [Hudi 
Streamer](https://github.com/apache/hudi/blob/d3edac4612bde2fa9deca9536801dbc48961fb95/docker/demo/sparksql-incremental.commands#L50)
 tool or 
[datasource](https://hudi.apache.org/docs/configurations#hoodiedatasourcehive_syncenable),
 the dataset is available in Hive as a couple of tables, that can now be read 
using HiveQL, Presto or SparkSQL. See 
[here](https://hudi.apache.org/docs/querying_data/) for more.
+### How does Hudi plan to address the liveness issue above for incremental 
queries?
 
-### How does Hudi handle duplicate record keys in an input?
+Hudi has had a proposal to streamline/improve this experience by adding a 
transition-time to our timeline, which will remove the [liveness 
sacrifice](https://en.wikipedia.org/wiki/Safety_and_liveness_properties) 
currently being made and makes it easier to understand. This has been delayed 
for a few reasons (a) Large hosted query engines and users not upgrading fast 
enough. (b) the issues brought up - 
\[[1](https://hudi.apache.org/docs/next/faq#does-hudis-use-of-wall-clock-timestamp-for-i
 [...]
+
+### Does Hudi’s use of wall clock timestamp for instants pose any clock skew 
issues?
 
-When issuing an `upsert` operation on a dataset and the batch of records 
provided contains multiple entries for a given key, then all of them are 
reduced into a single final value by repeatedly calling payload class's 
[preCombine()](https://github.com/apache/hudi/blob/d3edac4612bde2fa9deca9536801dbc48961fb95/hudi-common/src/main/java/org/apache/hudi/common/model/HoodieRecordPayload.java#L40)
 method . By default, we pick the record with the greatest value (determined by 
calling .compareTo [...]
+Theoretically speaking, a clock skew between two writers can result in 
different notions of time, and order the timeline differently. But, the current 
NTP implementations and regions standardizing on UTC make this very impractical 
to happen in practice. Even many popular OLTP-based systems such as DynamoDB 
and Cassandra use timestamps for record level conflict detection, cloud 
providers/OSS NTP are moving towards atomic/synchronized clocks all the time 
\[[1](https://aws.amazon.com/about- [...]
 
-For an insert or bulk_insert operation, no such pre-combining is performed. 
Thus, if your input contains duplicates, the dataset would also contain 
duplicates. If you don't want duplicate records either issue an upsert or 
consider specifying option to de-duplicate input in either 
[datasource](https://hudi.apache.org/docs/configurations#hoodiedatasourcewriteinsertdropduplicates)
 or [Hudi 
Streamer](https://github.com/apache/hudi/blob/d3edac4612bde2fa9deca9536801dbc48961fb95/hudi-utilities/
 [...]
+Further - Hudi’s commit time can be a logical time and need not strictly be a 
timestamp. If there are still uniqueness concerns over clock skew, it is easy 
for Hudi to further extend the timestamp implementation with salts or employ 
[TrueTime](https://www.cockroachlabs.com/blog/living-without-atomic-clocks/) 
approaches that have been proven at planet scale. In short, this is not a 
design issue, but more of a pragmatic implementation choice, that allows us to 
implement unique features lik [...]
+
+## Writing Tables
+
+### What are some ways to write a Hudi table?
+
+Typically, you obtain a set of partial updates/inserts from your source and 
issue [write operations](https://hudi.apache.org/docs/write_operations/) 
against a Hudi table. If you ingesting data from any of the standard sources 
like Kafka, or tailing DFS, the [delta 
streamer](https://hudi.apache.org/docs/hoodie_deltastreamer#deltastreamer) tool 
is invaluable and provides an easy, self-managed solution to getting data 
written into Hudi. You can also write your own code to capture data from  [...]
+
+### How is a Hudi writer job deployed?
+
+The nice thing about Hudi writing is that it just runs like any other spark 
job would on a YARN/Mesos or even a K8S cluster. So you could simply use the 
Spark UI to get visibility into write operations.
 
 ### Can I implement my own logic for how input records are merged with record 
on storage?
 
-Here is the payload interface that is used in Hudi to represent any hudi 
record. 
+Here is the payload interface that is used in Hudi to represent any hudi 
record.
 
 ```java
 public interface HoodieRecordPayload<T extends HoodieRecordPayload> extends 
Serializable {
@@ -157,7 +158,6 @@ public interface HoodieRecordPayload<T extends 
HoodieRecordPayload> extends Seri
    * @return the combined value
    */
   default T preCombine(T another, Properties properties);
- 
 /**
    * This methods lets you write custom merging/combining logic to produce new 
values as a function of current value on storage and whats contained
    * in this object. Implementations can leverage properties if required.
@@ -184,7 +184,6 @@ public interface HoodieRecordPayload<T extends 
HoodieRecordPayload> extends Seri
    */
   @PublicAPIMethod(maturity = ApiMaturityLevel.STABLE)
   default Option<IndexedRecord> getInsertValue(Schema schema, Properties 
properties) throws IOException;
- 
 /**
    * This method can be used to extract some metadata from 
HoodieRecordPayload. The metadata is passed to {@code 
WriteStatus.markSuccess()} and
    * {@code WriteStatus.markFailure()} in order to compute some aggregate 
metrics using the metadata in the context of a write success or failure.
@@ -194,29 +193,30 @@ public interface HoodieRecordPayload<T extends 
HoodieRecordPayload> extends Seri
   default Option<Map<String, String>> getMetadata() {
     return Option.empty();
   }
- 
 }
 ```
 
-As you could see, ([combineAndGetUpdateValue(), 
getInsertValue()](https://github.com/apache/hudi/blob/master/hudi-common/src/main/java/org/apache/hudi/common/model/HoodieRecordPayload.java))
 that control how the record on storage is combined with the incoming 
update/insert to generate the final value to be written back to storage. 
preCombine() is used to merge records within the same incoming batch. 
+As you could see, ([combineAndGetUpdateValue(), 
getInsertValue()](https://github.com/apache/hudi/blob/master/hudi-common/src/main/java/org/apache/hudi/common/model/HoodieRecordPayload.java))
 that control how the record on storage is combined with the incoming 
update/insert to generate the final value to be written back to storage. 
preCombine() is used to merge records within the same incoming batch.
 
 ### How do I delete records in the dataset using Hudi?
 
 GDPR has made deletes a must-have tool in everyone's data management toolbox. 
Hudi supports both soft and hard deletes. For details on how to actually 
perform them, see [here](https://hudi.apache.org/docs/writing_data/#deletes).
 
-### Does deleted records appear in Hudi's incremental query results?
+### Should I need to worry about deleting all copies of the records in case of 
duplicates?
 
-Soft Deletes (unlike hard deletes) do appear in the incremental pull query 
results. So, if you need a mechanism to propagate deletes to downstream tables, 
you can use Soft deletes.
+No. Hudi removes all the copies of a record key when deletes are issued. Here 
is the long form explanation - Sometimes accidental user errors can lead to 
duplicates introduced into a Hudi table by either [concurrent 
inserts](https://hudi.apache.org/docs/next/faq#can-concurrent-inserts-cause-duplicates)
 or by [not deduping the input 
records](https://hudi.apache.org/docs/next/faq#can-single-writer-inserts-have-duplicates)
 for an insert operation. However, using the right index (e.g., in th [...]
 
-### How do I migrate my data to Hudi?
+### How does Hudi handle duplicate record keys in an input?
 
-Hudi provides built in support for rewriting your entire dataset into Hudi 
one-time using the HDFSParquetImporter tool available from the hudi-cli . You 
could also do this via a simple read and write of the dataset using the Spark 
datasource APIs. Once migrated, writes can be performed using normal means 
discussed 
[here](https://hudi.apache.org/learn/faq#what-are-some-ways-to-write-a-hudi-dataset).
 This topic is discussed in detail 
[here](https://hudi.apache.org/docs/migration_guide/), i [...]
+When issuing an `upsert` operation on a table and the batch of records 
provided contains multiple entries for a given key, then all of them are 
reduced into a single final value by repeatedly calling payload class's 
[preCombine()](https://github.com/apache/hudi/blob/d3edac4612bde2fa9deca9536801dbc48961fb95/hudi-common/src/main/java/org/apache/hudi/common/model/HoodieRecordPayload.java#L40)
 method . By default, we pick the record with the greatest value (determined by 
calling .compareTo() [...]
 
-### How can I pass hudi configurations to my spark job?
+For an insert or bulk\_insert operation, no such pre-combining is performed. 
Thus, if your input contains duplicates, the table would also contain 
duplicates. If you don't want duplicate records either issue an upsert or 
consider specifying option to de-duplicate input in either 
[datasource](https://hudi.apache.org/docs/configurations#hoodiedatasourcewriteinsertdropduplicates)
 or 
[deltastreamer](https://github.com/apache/hudi/blob/d3edac4612bde2fa9deca9536801dbc48961fb95/hudi-utilities/s
 [...]
 
-Hudi configuration options covering the datasource and low level Hudi write 
client (which both Hudi Streamer & datasource internally call) are 
[here](https://hudi.apache.org/docs/configurations/). Invoking *--help* on any 
tool such as Hudi Streamer would print all the usage options. A lot of the 
options that control upsert, file sizing behavior are defined at the write 
client level and below is how we pass them to different options available for 
writing data.
+### How can I pass hudi configurations to my spark writer job?
 
- - For Spark DataSource, you can use the "options" API of DataFrameWriter to 
pass in these configs. 
+Hudi configuration options covering the datasource and low level Hudi write 
client (which both deltastreamer & datasource internally call) are 
[here](https://hudi.apache.org/docs/configurations/). Invoking _\--help_ on any 
tool such as DeltaStreamer would print all the usage options. A lot of the 
options that control upsert, file sizing behavior are defined at the write 
client level and below is how we pass them to different options available for 
writing data.
+
+*   For Spark DataSource, you can use the "options" API of DataFrameWriter to 
pass in these configs.
 
 ```scala
 inputDF.write().format("org.apache.hudi")
@@ -225,179 +225,251 @@ inputDF.write().format("org.apache.hudi")
   ...
 ```
 
- - When using `HoodieWriteClient` directly, you can simply construct 
HoodieWriteConfig object with the configs in the link you mentioned.
-
- - When using HoodieStreamer tool to ingest, you can set the configs in 
properties file and pass the file as the cmdline argument "*--props*"
+*   When using `HoodieWriteClient` directly, you can simply construct 
HoodieWriteConfig object with the configs in the link you mentioned.
+*   When using HoodieDeltaStreamer tool to ingest, you can set the configs in 
properties file and pass the file as the cmdline argument "_\--props_"
 
 ### How to create Hive style partition folder structure?
 
 By default Hudi creates the partition folders with just the partition values, 
but if would like to create partition folders similar to the way Hive will 
generate the structure, with paths that contain key value pairs, like 
country=us/… or datestr=2021-04-20. This is Hive style (or format) 
partitioning. The paths include both the names of the partition keys and the 
values that each path represents.
 
 To enable hive style partitioning, you need to add this hoodie config when you 
write your data:
-```java
+
+```plain
 hoodie.datasource.write.hive_style_partitioning: true
 ```
 
+### Can I register my Hudi table with Apache Hive metastore?
+
+Yes. This can be performed either via the standalone [Hive Sync 
tool](https://hudi.apache.org/docs/syncing_metastore#hive-sync-tool) or using 
options in [Hudi 
Streamer](https://github.com/apache/hudi/blob/d3edac4612bde2fa9deca9536801dbc48961fb95/docker/demo/sparksql-incremental.commands#L50)
 tool or 
[datasource](https://hudi.apache.org/docs/configurations#hoodiedatasourcehive_syncenable).
+
+### What's Hudi's schema evolution story?
+
+Hudi uses Avro as the internal canonical representation for records, primarily 
due to its nice [schema compatibility & 
evolution](https://docs.confluent.io/platform/current/schema-registry/avro.html)
 properties. This is a key aspect of having reliability in your ingestion or 
ETL pipelines. As long as the schema passed to Hudi (either explicitly in Hudi 
Streamer schema provider configs or implicitly by Spark Datasource's Dataset 
schemas) is backwards compatible (e.g no field deletes, only [...]
+
+Starting 0.11.0, Spark SQL DDL support (experimental) was added for Spark 
3.1.x and Spark 3.2.1 via ALTER TABLE syntax. Please refer to the [schema 
evolution guide](https://hudi.apache.org/docs/schema_evolution) for more 
details on  Schema-on-read for Spark..
+
+### What performance/ingest latency can I expect for Hudi writing?
+
+The speed at which you can write into Hudi depends on the [write 
operation](https://hudi.apache.org/docs/write_operations) and some trade-offs 
you make along the way like file sizing. Just like how databases incur overhead 
over direct/raw file I/O on disks, Hudi operations may have overhead from 
supporting database like features compared to reading/writing raw DFS files. 
That said, Hudi implements advanced techniques from database literature to keep 
these minimal. User is encouraged to h [...]
+
+| Storage Type | Type of workload | Performance | Tips |
+| ---| ---| ---| --- |
+| copy on write | bulk\_insert | Should match vanilla spark writing + an 
additional sort to properly size files | properly size [bulk insert 
parallelism](https://hudi.apache.org/docs/configurations#hoodiebulkinsertshuffleparallelism)
 to get right number of files. use insert if you want this auto tuned . 
Configure 
[hoodie.bulkinsert.sort.mode](https://hudi.apache.org/docs/configurations#hoodiebulkinsertsortmode)
 for better file sizes at the cost of memory. The default value NONE offers th 
[...]
+| copy on write | insert | Similar to bulk insert, except the file sizes are 
auto tuned requiring input to be cached into memory and custom partitioned. | 
Performance would be bound by how parallel you can write the ingested data. 
Tune [this 
limit](https://hudi.apache.org/docs/configurations#hoodieinsertshuffleparallelism)
 up, if you see that writes are happening from only a few executors. |
+| copy on write | upsert/ de-duplicate & insert | Both of these would involve 
index lookup. Compared to naively using Spark (or similar framework)'s JOIN to 
identify the affected records, Hudi indexing is often 7-10x faster as long as 
you have ordered keys (discussed below) or <50% updates. Compared to naively 
overwriting entire partitions, Hudi write can be several magnitudes faster 
depending on how many files in a given partition is actually updated. For e.g, 
if a partition has 1000 fi [...]
+| merge on read | bulk insert | Currently new data only goes to parquet files 
and thus performance here should be similar to copy\_on\_write bulk insert. 
This has the nice side-effect of getting data into parquet directly for query 
performance. [HUDI-86](https://issues.apache.org/jira/browse/HUDI-86) will add 
support for logging inserts directly and this up drastically. |  |
+| merge on read | insert | Similar to above |  |
+| merge on read | upsert/ de-duplicate & insert | Indexing performance would 
remain the same as copy-on-write, while ingest latency for updates (costliest 
I/O operation in copy\_on\_write) are sent to log files and thus with 
asynchronous compaction provides very very good ingest performance with low 
write amplification. |  |
+
+Like with many typical system that manage time-series data, Hudi performs much 
better if your keys have a timestamp prefix or monotonically 
increasing/decreasing. You can almost always achieve this. Even if you have 
UUID keys, you can follow tricks like 
[this](https://www.percona.com/blog/2014/12/19/store-uuid-optimized-way/) to 
get keys that are ordered. See also [Tuning 
Guide](https://hudi.apache.org/docs/tuning-guide) for more tips on JVM and 
other configurations.
+
+### What performance can I expect for Hudi reading/queries?
+
+*   For ReadOptimized views, you can expect the same best in-class columnar 
query performance as a standard parquet table in Hive/Spark/Presto
+*   For incremental views, you can expect speed up relative to how much data 
usually changes in a given time window and how much time your entire scan 
takes. For e.g, if only 100 files changed in the last hour in a partition of 
1000 files, then you can expect a speed of 10x using incremental pull in Hudi 
compared to full scanning the partition to find out new data.
+*   For real time views, you can expect performance similar to the same avro 
backed table in Hive/Spark/Presto
+
+### How do I to avoid creating tons of small files?
+
+A key design decision in Hudi was to avoid creating small files and always 
write properly sized files.
+
+There are 2 ways to avoid creating tons of small files in Hudi and both of 
them have different trade-offs:
+
+a) **Auto Size small files during ingestion**: This solution trades 
ingest/writing time to keep queries always efficient. Common approaches to 
writing very small files and then later stitching them together only solve for 
system scalability issues posed by small files and also let queries slow down 
by exposing small files to them anyway.
+
+Hudi has the ability to maintain a configured target file size, when 
performing **upsert/insert** operations. (Note: **bulk\_insert** operation does 
not provide this functionality and is designed as a simpler replacement for 
normal `spark.write.parquet` )
+
+For **copy-on-write**, this is as simple as configuring the [maximum size for 
a base/parquet 
file](https://hudi.apache.org/docs/configurations#hoodieparquetmaxfilesize) and 
the [soft 
limit](https://hudi.apache.org/docs/configurations#hoodieparquetsmallfilelimit) 
below which a file should be considered a small file. For the initial bootstrap 
to Hudi table, tuning record size estimate is also important to ensure 
sufficient records are bin-packed in a parquet file. For subsequent writes, Hu 
[...]
+
+For **merge-on-read**, there are few more configs to set. MergeOnRead works 
differently for different INDEX choices.
+
+*   Indexes with **canIndexLogFiles = true** : Inserts of new data go directly 
to log files. In this case, you can configure the [maximum log 
size](https://hudi.apache.org/docs/configurations#hoodielogfilemaxsize) and a 
[factor](https://hudi.apache.org/docs/configurations#hoodielogfiletoparquetcompressionratio)
 that denotes reduction in size when data moves from avro to parquet files.
+*   Indexes with **canIndexLogFiles = false** : Inserts of new data go only to 
parquet files. In this case, the same configurations as above for the 
COPY\_ON\_WRITE case applies.
+
+NOTE : In either case, small files will be auto sized only if there is no 
PENDING compaction or associated log file for that particular file slice. For 
example, for case 1: If you had a log file and a compaction C1 was scheduled to 
convert that log file to parquet, no more inserts can go into that log file. 
For case 2: If you had a parquet file and an update ended up creating an 
associated delta log file, no more inserts can go into that parquet file. Only 
after the compaction has been p [...]
+
+b) 
[**Clustering**](https://hudi.apache.org/blog/2021/01/27/hudi-clustering-intro) 
: This is a feature in Hudi to group small files into larger ones either 
synchronously or asynchronously. Since first solution of auto-sizing small 
files has a tradeoff on ingestion speed (since the small files are sized during 
ingestion), if your use-case is very sensitive to ingestion latency where you 
don't want to compromise on ingestion speed which may end up creating a lot of 
small files, clustering  [...]
+
+_Please note that Hudi always creates immutable files on disk. To be able to 
do auto-sizing or clustering, Hudi will always create a newer version of the 
smaller file, resulting in 2 versions of the same file. The cleaner service 
will later kick in and delte the older version small file and keep the latest 
one._
+
+### How do I use DeltaStreamer or Spark DataSource API to write to a 
Non-partitioned Hudi table ?
+
+Hudi supports writing to non-partitioned tables. For writing to a 
non-partitioned Hudi table and performing hive table syncing, you need to set 
the below configurations in the properties passed:
+
+```plain
+hoodie.datasource.write.keygenerator.class=org.apache.hudi.keygen.NonpartitionedKeyGenerator
+hoodie.datasource.hive_sync.partition_extractor_class=org.apache.hudi.hive.NonPartitionedExtractor
+```
+
+### How can I reduce table versions created by Hudi in AWS Glue Data Catalog/ 
metastore?
+
+With each commit, Hudi creates a new table version in the metastore. This can 
be reduced by setting the option
+
+[hoodie.datasource.meta\_sync.condition.sync](https://hudi.apache.org/docs/configurations#hoodiedatasourcemeta_syncconditionsync)
 to true.
+
+This will ensure that hive sync is triggered on schema or partitions changes.
+
+### If there are failed writes in my timeline, do I see duplicates?
+
+No, Hudi does not expose uncommitted files/blocks to the readers. Further, 
Hudi strives to automatically manage the table for the user, by actively 
cleaning up files created from failed/aborted writes. See [marker 
mechanism](https://hudi.apache.org/blog/2021/08/18/improving-marker-mechanism/).
+
+### How are conflicts detected in Hudi between multiple writers?
+
+Hudi employs [optimistic concurrency 
control](https://hudi.apache.org/docs/concurrency_control#supported-concurrency-controls)
 between writers, while implementing MVCC based concurrency control between 
writers and the table services. Concurrent writers to the same table need to be 
configured with the same lock provider configuration, to safely perform writes. 
By default (implemented in 
“[SimpleConcurrentFileWritesConflictResolutionStrategy](https://github.com/apache/hudi/blob/master/hudi
 [...]
+
+### Can single-writer inserts have duplicates?
+
+By default, Hudi turns off key based de-duplication for INSERT/BULK\_INSERT 
operations and thus the table could contain duplicates. If users believe, they 
have duplicates in inserts, they can either issue UPSERT or consider specifying 
the option to de-duplicate input in either 
[datasource](https://hudi.apache.org/docs/configurations#hoodiedatasourcewriteinsertdropduplicates)
 or 
[deltastreamer](https://github.com/apache/hudi/blob/d3edac4612bde2fa9deca9536801dbc48961fb95/hudi-utilities/src
 [...]
+
+### Can concurrent inserts cause duplicates?
+
+Yes. As mentioned before, the default conflict detection strategy only check 
for conflicting updates to the same file group IDs. In the case of concurrent 
inserts, inserted records end up creating new file groups and thus can go 
undetected. Most common workload patterns use multi-writer capability in the 
case of running ingestion of new data and concurrently backfilling/deleting 
older data, with NO overlap in the primary keys of the records. However, this 
can be implemented (or better ye [...]
+
+## Querying Tables
+
+### Does deleted records appear in Hudi's incremental query results?
+
+Soft Deletes (unlike hard deletes) do appear in the incremental pull query 
results. So, if you need a mechanism to propagate deletes to downstream tables, 
you can use Soft deletes.
+
 ### How do I pass hudi configurations to my beeline Hive queries?
 
 If Hudi's input format is not picked the returned results may be incorrect. To 
ensure correct inputformat is picked, please use 
`org.apache.hadoop.hive.ql.io.HiveInputFormat` or 
`org.apache.hudi.hadoop.hive.HoodieCombineHiveInputFormat` for 
`hive.input.format` config. This can be set like shown below:
-```java
+
+```plain
 set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat
 ```
 
 or
 
-```java
+```plain
 set hive.input.format=org.apache.hudi.hadoop.hive.HoodieCombineHiveInputFormat
 ```
 
-### Can I register my Hudi dataset with Apache Hive metastore?
+### Does Hudi guarantee consistent reads? How to think about read optimized 
queries?
 
-Yes. This can be performed either via the standalone [Hive Sync 
tool](https://hudi.apache.org/docs/syncing_metastore#hive-sync-tool) or using 
options in [Hudi 
Streamer](https://github.com/apache/hudi/blob/d3edac4612bde2fa9deca9536801dbc48961fb95/docker/demo/sparksql-incremental.commands#L50)
 tool or 
[datasource](https://hudi.apache.org/docs/configurations#hoodiedatasourcehive_syncenable).
+Hudi does offer consistent reads. To read the latest snapshot of a MOR table, 
a user should use snapshot query. The [read-optimized 
queries](https://hudi.apache.org/docs/table_types#query-types) (targeted for 
the MOR table ONLY) are an add on benefit to provides users with a practical 
tradeoff of decoupling writer performance vs query performance, leveraging the 
fact that most queries query say the most recent data in the table.
 
-### How does the Hudi indexing work & what are its benefits? 
+Hudi’s read-optimized query is targeted for the MOR table only, with guidance 
around how compaction should be run to achieve predictable results. In the MOR 
table, the compaction, which runs every few commits (or “deltacommit” to be 
exact for the MOR table) by default, merges the base (parquet) file and 
corresponding change log files to a new base file within each file group, so 
that the snapshot query serving the latest data immediately after compaction 
reads the base files only.  Simil [...]
 
-The indexing component is a key part of the Hudi writing and it maps a given 
recordKey to a fileGroup inside Hudi consistently. This enables faster 
identification of the file groups that are affected/dirtied by a given write 
operation.
+Users must use snapshot queries to read the latest snapshot of a MOR table.  
Popular engines including Spark, Presto, and Hive already support snapshot 
queries on MOR table and the snapshot query support in Trino is in progress 
(the [PR](https://github.com/trinodb/trino/pull/14786) is under review).  Note 
that the read-optimized query does not apply to the COW table.
 
-Hudi supports a few options for indexing as below
-
- - *HoodieBloomIndex * : Uses a bloom filter and ranges information placed in 
the footer of parquet/base files (and soon log files as well)
- - *HoodieGlobalBloomIndex* : The non global indexing only enforces uniqueness 
of a key inside a single partition i.e the user is expected to know the 
partition under which a given record key is stored. This helps the indexing 
scale very well for even [very large 
datasets](https://eng.uber.com/uber-big-data-platform/). However, in some 
cases, it might be necessary instead to do the de-duping/enforce uniqueness 
across all partitions and the global bloom index does exactly that. If this is 
[...]
- - *HBaseIndex* : Apache HBase is a key value store, typically found in close 
proximity to HDFS. You can also store the index inside HBase, which could be 
handy if you are already operating HBase.
- - *HoodieSimpleIndex (default)* : A simple index which reads interested 
fields (record key and partition path) from base files and joins with incoming 
records to find the tagged location.
- - *HoodieGlobalSimpleIndex* : Global version of Simple Index, where in 
uniqueness is on record key across entire table. 
- - *HoodieBucketIndex* : Each partition has statically defined buckets to 
which records are tagged with. Since locations are tagged via hashing 
mechanism, this index lookup will be very efficient. 
- - *HoodieSparkConsistentBucketIndex* : This is also similar to Bucket Index. 
Only difference is that, data skews can be tackled by dynamically changing the 
bucket number.  
-
-You can implement your own index if you'd like, by subclassing the 
`HoodieIndex` class and configuring the index class name in configs. 
+## Table Services
 
 ### What does the Hudi cleaner do?
 
-The Hudi cleaner process often runs right after a commit and deltacommit and 
goes about deleting old files that are no longer needed. If you are using the 
incremental pull feature, then ensure you configure the cleaner to [retain 
sufficient amount of last 
commits](https://hudi.apache.org/docs/configurations#hoodiecleanercommitsretained)
 to rewind. Another consideration is to provide sufficient time for your long 
running jobs to finish running. Otherwise, the cleaner could delete a file t 
[...]
-
-### What's Hudi's schema evolution story?
-
-Hudi uses Avro as the internal canonical representation for records, primarily 
due to its nice [schema compatibility & 
evolution](https://docs.confluent.io/platform/current/schema-registry/avro.html)
 properties. This is a key aspect of having reliability in your ingestion or 
ETL pipelines. As long as the schema passed to Hudi (either explicitly in Hudi 
Streamer schema provider configs or implicitly by Spark Datasource's Dataset 
schemas) is backwards compatible (e.g no field deletes, only [...]
+The Hudi cleaner process often runs right after a commit and deltacommit and 
goes about deleting old files that are no longer needed. If you are using the 
incremental pull feature, then ensure you configure the cleaner to [retain 
sufficient amount of last 
commits](https://hudi.apache.org/docs/configurations#hoodiecleanercommitsretained)
 to rewind. Another consideration is to provide sufficient time for your long 
running jobs to finish running. Otherwise, the cleaner could delete a file t 
[...]
 
-### How do I run compaction for a MOR dataset?
+### How do I run compaction for a MOR table?
 
-Simplest way to run compaction on MOR dataset is to run the [compaction 
inline](https://hudi.apache.org/docs/configurations#hoodiecompactinline), at 
the cost of spending more time ingesting; This could be particularly useful, in 
common cases where you have small amount of late arriving data trickling into 
older partitions. In such a scenario, you may want to just aggressively compact 
the last N partitions while waiting for enough logs to accumulate for older 
partitions. The net effect is [...]
+Simplest way to run compaction on MOR table is to run the [compaction 
inline](https://hudi.apache.org/docs/configurations#hoodiecompactinline), at 
the cost of spending more time ingesting; This could be particularly useful, in 
common cases where you have small amount of late arriving data trickling into 
older partitions. In such a scenario, you may want to just aggressively compact 
the last N partitions while waiting for enough logs to accumulate for older 
partitions. The net effect is t [...]
 
-That said, for obvious reasons of not blocking ingesting for compaction, you 
may want to run it asynchronously as well. This can be done either via a 
separate [compaction 
job](https://github.com/apache/hudi/blob/master/hudi-utilities/src/main/java/org/apache/hudi/utilities/HoodieCompactor.java)
 that is scheduled by your workflow scheduler/notebook independently. If you 
are using Hudi Streamer, then you can run in [continuous 
mode](https://github.com/apache/hudi/blob/d3edac4612bde2fa9deca [...]
+That said, for obvious reasons of not blocking ingesting for compaction, you 
may want to run it asynchronously as well. This can be done either via a 
separate [compaction 
job](https://github.com/apache/hudi/blob/master/hudi-utilities/src/main/java/org/apache/hudi/utilities/HoodieCompactor.java)
 that is scheduled by your workflow scheduler/notebook independently. If you 
are using delta streamer, then you can run in [continuous 
mode](https://github.com/apache/hudi/blob/d3edac4612bde2fa9dec [...]
 
-### What options do I have for asynchronous/offline compactions on MOR dataset?
+### What options do I have for asynchronous/offline compactions on MOR table?
 
 There are a couple of options depending on how you write to Hudi. But first 
let us understand briefly what is involved. There are two parts to compaction
-- Scheduling: In this step, Hudi scans the partitions and selects file slices 
to be compacted. A compaction plan is finally written to Hudi timeline. 
Scheduling needs tighter coordination with other writers (regular ingestion is 
considered one of the writers). If scheduling is done inline with the ingestion 
job, this coordination is automatically taken care of. Else when scheduling 
happens asynchronously a lock provider needs to be configured for this 
coordination among multiple writers.
-- Execution: In this step the compaction plan is read and file slices are 
compacted. Execution doesnt need the same level of coordination with other 
writers as Scheduling step and can be decoupled from ingestion job easily.
+
+*   Scheduling: In this step, Hudi scans the partitions and selects file 
slices to be compacted. A compaction plan is finally written to Hudi timeline. 
Scheduling needs tighter coordination with other writers (regular ingestion is 
considered one of the writers). If scheduling is done inline with the ingestion 
job, this coordination is automatically taken care of. Else when scheduling 
happens asynchronously a lock provider needs to be configured for this 
coordination among multiple writers.
+*   Execution: In this step the compaction plan is read and file slices are 
compacted. Execution doesnt need the same level of coordination with other 
writers as Scheduling step and can be decoupled from ingestion job easily.
 
 Depending on how you write to Hudi these are the possible options currently.
-- Hudi Streamer:
-   - In Continuous mode, asynchronous compaction is achieved by default. Here 
scheduling is done by the ingestion job inline and compaction execution is 
achieved asynchronously by a separate parallel thread.
-   - In non continuous mode, only inline compaction is possible. 
-   - Please note in either mode, by passing --disable-compaction compaction is 
completely disabled
-- Spark datasource:
-   - Async scheduling and async execution can be achieved by periodically 
running an offline Hudi Compactor Utility or Hudi CLI. However this needs a 
lock provider to be configured.
-   - Alternately, from 0.11.0, to avoid dependency on lock providers, 
scheduling alone can be done inline by regular writer using the config 
`hoodie.compact.schedule.inline` . And compaction execution can be done offline 
by periodically triggering the Hudi Compactor Utility or Hudi CLI.
-- Spark structured streaming:
-   - Compactions are scheduled and executed asynchronously inside the 
streaming job. Async Compactions are enabled by default for structured 
streaming jobs on Merge-On-Read table.
-   - Please note it is not possible to disable async compaction for MOR 
dataset with spark structured streaming. 
-- Flink:
-   - Async compaction is enabled by default for Merge-On-Read table.
-   - Offline compaction can be achieved by setting 
```compaction.async.enabled``` to ```false``` and periodically running [Flink 
offline 
Compactor](https://hudi.apache.org/docs/next/compaction/#flink-offline-compaction).
 When running the offline compactor, one needs to ensure there are no active 
writes to the table.
-   - Third option (highly recommended over the second one) is to schedule the 
compactions from the regular ingestion job and executing the compaction plans 
from an offline job. To achieve this set ```compaction.async.enabled``` to 
```false```, ```compaction.schedule.enabled``` to ```true``` and then run the 
[Flink offline 
Compactor](https://hudi.apache.org/docs/next/compaction/#flink-offline-compaction)
 periodically to execute the plans.
+
+*   DeltaStreamer:
+  *   In Continuous mode, asynchronous compaction is achieved by default. Here 
scheduling is done by the ingestion job inline and compaction execution is 
achieved asynchronously by a separate parallel thread.
+  *   In non continuous mode, only inline compaction is possible.
+  *   Please note in either mode, by passing --disable-compaction compaction 
is completely disabled
+*   Spark datasource:
+  *   Async scheduling and async execution can be achieved by periodically 
running an offline Hudi Compactor Utility or Hudi CLI. However this needs a 
lock provider to be configured.
+  *   Alternately, from 0.11.0, to avoid dependency on lock providers, 
scheduling alone can be done inline by regular writer using the config 
`hoodie.compact.schedule.inline` . And compaction execution can be done offline 
by periodically triggering the Hudi Compactor Utility or Hudi CLI.
+*   Spark structured streaming:
+  *   Compactions are scheduled and executed asynchronously inside the 
streaming job. Async Compactions are enabled by default for structured 
streaming jobs on Merge-On-Read table.
+  *   Please note it is not possible to disable async compaction for MOR table 
with spark structured streaming.
+*   Flink:
+  *   Async compaction is enabled by default for Merge-On-Read table.
+  *   Offline compaction can be achieved by setting `compaction.async.enabled` 
to `false` and periodically running [Flink offline 
Compactor](https://hudi.apache.org/docs/next/compaction/#flink-offline-compaction).
 When running the offline compactor, one needs to ensure there are no active 
writes to the table.
+  *   Third option (highly recommended over the second one) is to schedule the 
compactions from the regular ingestion job and executing the compaction plans 
from an offline job. To achieve this set `compaction.async.enabled` to `false`, 
`compaction.schedule.enabled` to `true` and then run the [Flink offline 
Compactor](https://hudi.apache.org/docs/next/compaction/#flink-offline-compaction)
 periodically to execute the plans.
 
 ### How to disable all table services in case of multiple writers?
 
-[hoodie.table.services.enabled](https://hudi.apache.org/docs/configurations/#hoodietableservicesenabled)
 is an umbrella config that can be used to turn off all table services at once 
without having to individually disable them. This is handy in use cases where 
there are multiple writers doing ingestion. While one of the main pipelines can 
take care of the table services, other ingestion pipelines can disable them to 
avoid frequent trigger of cleaning/clustering etc. This does not apply t [...]
+[hoodie.table.services.enabled](https://hudi.apache.org/docs/configurations#hoodietableservicesenabled)
 is an umbrella config that can be used to turn off all table services at once 
without having to individually disable them. This is handy in use cases where 
there are multiple writers doing ingestion. While one of the main pipelines can 
take care of the table services, other ingestion pipelines can disable them to 
avoid frequent trigger of cleaning/clustering etc. This does not apply to [...]
 
-### What performance/ingest latency can I expect for Hudi writing?
+### Why does Hudi retain at-least one previous commit even after setting 
hoodie.cleaner.commits.retained': 1 ?
 
-The speed at which you can write into Hudi depends on the [write 
operation](https://hudi.apache.org/docs/write_operations) and some trade-offs 
you make along the way like file sizing. Just like how databases incur overhead 
over direct/raw file I/O on disks,  Hudi operations may have overhead from 
supporting  database like features compared to reading/writing raw DFS files. 
That said, Hudi implements advanced techniques from database literature to keep 
these minimal. User is encouraged to [...]
+Hudi runs cleaner to remove old file versions as part of writing data either 
in inline or in asynchronous mode (0.6.0 onwards). Hudi Cleaner retains 
at-least one previous commit when cleaning old file versions. This is to 
prevent the case when concurrently running queries which are reading the latest 
file versions suddenly see those files getting deleted by cleaner because a new 
file version got added . In other words, retaining at-least one previous commit 
is needed for ensuring snapsho [...]
 
-| Storage Type | Type of workload | Performance | Tips |
-|-------|--------|--------|--------|
-| copy on write | bulk_insert | Should match vanilla spark writing + an 
additional sort to properly size files | properly size [bulk insert 
parallelism](https://hudi.apache.org/docs/configurations#hoodiebulkinsertshuffleparallelism)
 to get right number of files. use insert if you want this auto tuned . 
Configure 
[hoodie.bulkinsert.sort.mode](https://hudi.apache.org/docs/configurations#hoodiebulkinsertsortmode)
 for better file sizes at the cost of memory. The default value NONE offers the 
[...]
-| copy on write | insert | Similar to bulk insert, except the file sizes are 
auto tuned requiring input to be cached into memory and custom partitioned. | 
Performance would be bound by how parallel you can write the ingested data. 
Tune [this 
limit](https://hudi.apache.org/docs/configurations#hoodieinsertshuffleparallelism)
 up, if you see that writes are happening from only a few executors. |
-| copy on write | upsert/ de-duplicate & insert | Both of these would involve 
index lookup.  Compared to naively using Spark (or similar framework)'s JOIN to 
identify the affected records, Hudi indexing is often 7-10x faster as long as 
you have ordered keys (discussed below) or <50% updates. Compared to naively 
overwriting entire partitions, Hudi write can be several magnitudes faster 
depending on how many files in a given partition is actually updated. For e.g, 
if a partition has 1000 f [...]
-| merge on read | bulk insert | Currently new data only goes to parquet files 
and thus performance here should be similar to copy_on_write bulk insert. This 
has the nice side-effect of getting data into parquet directly for query 
performance. [HUDI-86](https://issues.apache.org/jira/browse/HUDI-86) will add 
support for logging inserts directly and this up drastically. | |
-| merge on read | insert | Similar to above | |
-| merge on read | upsert/ de-duplicate & insert | Indexing performance would 
remain the same as copy-on-write, while ingest latency for updates (costliest 
I/O operation in copy_on_write) are sent to log files and thus with 
asynchronous compaction provides very very good ingest performance with low 
write amplification. | |
+### Can I get notified when new commits happen in my Hudi table?
 
-Like with many typical system that manage time-series data, Hudi performs much 
better if your keys have a timestamp prefix or monotonically 
increasing/decreasing. You can almost always achieve this. Even if you have 
UUID keys, you can follow tricks like 
[this](https://www.percona.com/blog/2014/12/19/store-uuid-optimized-way/) to 
get keys that are ordered. See also [Tuning 
Guide](https://cwiki.apache.org/confluence/display/HUDI/Tuning+Guide) for more 
tips on JVM and other configurations. 
+Yes. Hudi provides the ability to post a callback notification about a write 
commit. You can use a http hook or choose to
 
-### What performance can I expect for Hudi reading/queries?
+be notified via a Kafka/pulsar topic or plug in your own implementation to get 
notified. Please refer 
[here](https://hudi.apache.org/docs/next/writing_data/#commit-notifications)
 
- - For ReadOptimized views, you can expect the same best in-class columnar 
query performance as a standard parquet table in Hive/Spark/Presto
- - For incremental views, you can expect speed up relative to how much data 
usually changes in a given time window and how much time your entire scan 
takes. For e.g, if only 100 files changed in the last hour in a partition of 
1000 files, then you can expect a speed of 10x using incremental pull in Hudi 
compared to full scanning the partition to find out new data.
- - For real time views, you can expect performance similar to the same avro 
backed table in Hive/Spark/Presto 
+for details
 
-### How do I to avoid creating tons of small files?
+## Storage
 
-A key design decision in Hudi was to avoid creating small files and always 
write properly sized files.
+### Does Hudi support cloud storage/object stores?
 
-There are 2 ways to avoid creating tons of small files in Hudi and both of 
them have different trade-offs:
+Yes. Generally speaking, Hudi is able to provide its functionality on any 
Hadoop FileSystem implementation and thus can read and write tables on [Cloud 
stores](https://hudi.apache.org/docs/cloud) (Amazon S3 or Microsoft Azure or 
Google Cloud Storage). Over time, Hudi has also incorporated specific design 
aspects that make building Hudi tables on the cloud easy, such as [consistency 
checks for 
s3](https://hudi.apache.org/docs/configurations#hoodieconsistencycheckenabled), 
Zero moves/renam [...]
 
-a) **Auto Size small files during ingestion**: This solution trades 
ingest/writing time to keep queries always efficient. Common approaches to 
writing very small files and then later stitching them together only solve for 
system scalability issues posed by small files and also let queries slow down 
by exposing small files to them anyway.
+### What is the difference between copy-on-write (COW) vs merge-on-read (MOR) 
table types?
 
-Hudi has the ability to maintain a configured target file size, when 
performing **upsert/insert** operations. (Note: **bulk_insert** operation does 
not provide this functionality and is designed as a simpler replacement for 
normal `spark.write.parquet`  )
+**Copy On Write** - This storage type enables clients to ingest data on 
columnar file formats, currently parquet. Any new data that is written to the 
Hudi table using COW storage type, will write new parquet files. Updating an 
existing set of rows will result in a rewrite of the entire parquet files that 
collectively contain the affected rows being updated. Hence, all writes to such 
tables are limited by parquet writing performance, the larger the parquet file, 
the higher is the time tak [...]
 
-For **copy-on-write**, this is as simple as configuring the [maximum size for 
a base/parquet 
file](https://hudi.apache.org/docs/configurations#hoodieparquetmaxfilesize) and 
the [soft 
limit](https://hudi.apache.org/docs/configurations#hoodieparquetsmallfilelimit) 
below which a file should be considered a small file. For the initial bootstrap 
to Hudi table, tuning record size estimate is also important to ensure 
sufficient records are bin-packed in a parquet file. For subsequent writes, Hu 
[...]
+**Merge On Read** - This storage type enables clients to ingest data quickly 
onto row based data format such as avro. Any new data that is written to the 
Hudi table using MOR table type, will write new log/delta files that internally 
store the data as avro encoded bytes. A compaction process (configured as 
inline or asynchronous) will convert log file format to columnar file format 
(parquet). Two different InputFormats expose 2 different views of this data, 
Read Optimized view exposes co [...]
 
-For **merge-on-read**, there are few more configs to set. MergeOnRead works 
differently for different INDEX choices.
+More details can be found [here](https://hudi.apache.org/docs/concepts/) and 
also [Design And 
Architecture](https://cwiki.apache.org/confluence/display/HUDI/Design+And+Architecture).
 
- - Indexes with **canIndexLogFiles = true** : Inserts of new data go directly 
to log files. In this case, you can configure the [maximum log 
size](https://hudi.apache.org/docs/configurations#hoodielogfilemaxsize) and a 
[factor](https://hudi.apache.org/docs/configurations#hoodielogfiletoparquetcompressionratio)
 that denotes reduction in size when data moves from avro to parquet files.
- - Indexes with **canIndexLogFiles = false** : Inserts of new data go only to 
parquet files. In this case, the same configurations as above for the 
COPY_ON_WRITE case applies.
+### How do I migrate my data to Hudi?
 
-NOTE : In either case, small files will be auto sized only if there is no 
PENDING compaction or associated log file for that particular file slice. For 
example, for case 1: If you had a log file and a compaction C1 was scheduled to 
convert that log file to parquet, no more inserts can go into that log file. 
For case 2: If you had a parquet file and an update ended up creating an 
associated delta log file, no more inserts can go into that parquet file. Only 
after the compaction has been p [...]
+Hudi provides built in support for rewriting your entire table into Hudi 
one-time using the HDFSParquetImporter tool available from the hudi-cli . You 
could also do this via a simple read and write of the dataset using the Spark 
datasource APIs. Once migrated, writes can be performed using normal means 
discussed 
[here](https://hudi.apache.org/docs/faq#what-are-some-ways-to-write-a-hudi-table).
 This topic is discussed in detail 
[here](https://hudi.apache.org/docs/migration_guide/), includ [...]
 
-b) 
**[Clustering](https://hudi.apache.org/blog/2021/01/27/hudi-clustering-intro)** 
: This is a feature in Hudi to group small files into larger ones either 
synchronously or asynchronously. Since first solution of auto-sizing small 
files has a tradeoff on ingestion speed (since the small files are sized during 
ingestion), if your use-case is very sensitive to ingestion latency where you 
don't want to compromise on ingestion speed which may end up creating a lot of 
small files, clustering  [...]
+### How to convert an existing COW table to MOR?
 
-*Please note that Hudi always creates immutable files on disk. To be able to 
do auto-sizing or clustering, Hudi will always create a newer version of the 
smaller file, resulting in 2 versions of the same file. The cleaner service 
will later kick in and delte the older version small file and keep the latest 
one.* 
+All you need to do is to edit the table type property in 
hoodie.properties(located at hudi_table_path/.hoodie/hoodie.properties).
 
-### Why does Hudi retain at-least one previous commit even after setting 
hoodie.cleaner.commits.retained': 1 ?
+But manually changing it will result in checksum errors. So, we have to go via 
hudi-cli.
 
-Hudi runs cleaner to remove old file versions as part of writing data either 
in inline or in asynchronous mode (0.6.0 onwards). Hudi Cleaner retains 
at-least one previous commit when cleaning old file versions. This is to 
prevent the case when concurrently running queries which are reading the latest 
file versions suddenly  see those files getting deleted by cleaner because a 
new file version got added . In other words, retaining at-least one previous 
commit is needed for ensuring snapsh [...]
+1. Copy existing hoodie.properties to a new location.
+2. Edit table type to MERGE\_ON\_READ
+3. launch hudi-cli
+  1. connect --path hudi\_table\_path
+  2. repair overwrite-hoodie-props --new-props-file new\_hoodie.properties
 
-### How do I use Hudi Streamer or Spark DataSource API to write to a 
Non-partitioned Hudi dataset ?
+### How can I find the average record size in a commit?
 
-Hudi supports writing to non-partitioned datasets. For writing to a 
non-partitioned Hudi dataset and performing hive table syncing, you need to set 
the below configurations in the properties passed:
+The `commit showpartitons` command in [HUDI 
CLI](https://hudi.apache.org/docs/cli) will show both "bytes written" and
 
-```java
-hoodie.datasource.write.keygenerator.class=org.apache.hudi.keygen.NonpartitionedKeyGenerator
-hoodie.datasource.hive_sync.partition_extractor_class=org.apache.hudi.hive.NonPartitionedExtractor
-```
+"records inserted." Divide the bytes written by records inserted to find the 
average size. Note that this answer assumes
 
-### Why do we have to set 2 different ways of configuring Spark to work with 
Hudi?
+metadata overhead is negligible. For a small table (such as 5 columns, 100 
records) this will not be the case.
 
-Non-Hive engines tend to do their own listing of DFS to query datasets. For 
e.g Spark starts reading the paths direct from the file system (HDFS or S3).
+### How does the Hudi indexing work & what are its benefits?
 
-From Spark the calls would be as below:
-- org.apache.spark.rdd.NewHadoopRDD.getPartitions
-- org.apache.parquet.hadoop.ParquetInputFormat.getSplits
-- org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits
-
-Without understanding of Hudi's file layout, engines would just plainly 
reading all the parquet files and displaying the data within them, with massive 
amounts of duplicates in the result.
+The indexing component is a key part of the Hudi writing and it maps a given 
recordKey to a fileGroup inside Hudi consistently. This enables faster 
identification of the file groups that are affected/dirtied by a given write 
operation.
 
-At a high level, there are two ways of configuring a query engine to properly 
read Hudi datasets
+Hudi supports a few options for indexing as below
 
-A) Making them invoke methods in `HoodieParquetInputFormat#getSplits` and 
`HoodieParquetInputFormat#getRecordReader`
+*   _HoodieBloomIndex_ : Uses a bloom filter and ranges information placed in 
the footer of parquet/base files (and soon log files as well)
+*   _HoodieGlobalBloomIndex_ : The non global indexing only enforces 
uniqueness of a key inside a single partition i.e the user is expected to know 
the partition under which a given record key is stored. This helps the indexing 
scale very well for even [very large 
datasets](https://eng.uber.com/uber-big-data-platform/). However, in some 
cases, it might be necessary instead to do the de-duping/enforce uniqueness 
across all partitions and the global bloom index does exactly that. If this i 
[...]
+*   _HBaseIndex_ : Apache HBase is a key value store, typically found in close 
proximity to HDFS. You can also store the index inside HBase, which could be 
handy if you are already operating HBase.
+*   _HoodieSimpleIndex (default)_ : A simple index which reads interested 
fields (record key and partition path) from base files and joins with incoming 
records to find the tagged location.
+*   _HoodieGlobalSimpleIndex_ : Global version of Simple Index, where in 
uniqueness is on record key across entire table.
+*   _HoodieBucketIndex_ : Each partition has statically defined buckets to 
which records are tagged with. Since locations are tagged via hashing 
mechanism, this index lookup will be very efficient.
+*   _HoodieSparkConsistentBucketIndex_ : This is also similar to Bucket Index. 
Only difference is that, data skews can be tackled by dynamically changing the 
bucket number.
 
-- Hive does this natively, since the InputFormat is the abstraction in Hive to 
plugin new table formats. HoodieParquetInputFormat extends 
MapredParquetInputFormat which is nothing but a input format for hive and we 
register Hudi tables to Hive metastore backed by these input formats
-- Presto also falls back to calling the input format when it sees a 
`UseFileSplitsFromInputFormat` annotation, to just obtain splits, but then goes 
on to use its own optimized/vectorized parquet reader for queries on 
Copy-on-Write tables
-- Spark can be forced into falling back to the HoodieParquetInputFormat class, 
using --conf spark.sql.hive.convertMetastoreParquet=false
+You can implement your own index if you'd like, by subclassing the 
`HoodieIndex` class and configuring the index class name in configs.
 
+### Can I switch from one index type to another without having to rewrite the 
entire table?
 
-B) Making the engine invoke a path filter or other means to directly call Hudi 
classes to filter the files on DFS and pick out the latest file slice
+It should be okay to switch between Bloom index and Simple index as long as 
they are not global.
 
-- Even though we can force Spark to fallback to using the InputFormat class, 
we could lose ability to use Spark's optimized parquet reader path by doing so. 
-- To keep benefits of native parquet read performance, we set the  
`HoodieROTablePathFilter` as a path filter, explicitly set this in the Spark 
Hadoop Configuration.There is logic in the file: to ensure that folders (paths) 
or files for Hoodie related files always ensures that latest file slice is 
selected. This filters out duplicate entries and shows latest entries for each 
record. 
+Moving from global to non-global and vice versa may not work. Also switching 
between Hbase (gloabl index) and regular bloom might not work.
 
 ### I have an existing dataset and want to evaluate Hudi using portion of that 
data ?
 
 You can bulk import portion of that data to a new hudi table. For example, if 
you want to try on a month of data -
 
-```java
+```scala
 spark.read.parquet("your_data_set/path/to/month")
      .write.format("org.apache.hudi")
      .option("hoodie.datasource.write.operation", "bulk_insert")
@@ -411,7 +483,7 @@ spark.read.parquet("your_data_set/path/to/month")
 
 Once you have the initial copy, you can simply run upsert operations on this 
by selecting some sample of data every round
 
-```java
+```scala
 spark.read.parquet("your_data_set/path/to/month").limit(n) // Limit n records
      .write.format("org.apache.hudi")
      .option("hoodie.datasource.write.operation", "upsert")
@@ -424,79 +496,17 @@ 
spark.read.parquet("your_data_set/path/to/month").limit(n) // Limit n records
 
 For merge on read table, you may want to also try scheduling and running 
compaction jobs. You can run compaction directly using spark submit on 
org.apache.hudi.utilities.HoodieCompactor or by using [HUDI 
CLI](https://hudi.apache.org/docs/cli).
 
-### If I keep my file versions at 1, with this configuration will i be able to 
do a roll back (to the last commit) when write fail?
+### Why does maintain record level commit metadata? Isn't tracking table 
version at file level good enough? 
 
-Yes, Commits happen before cleaning. Any failed commits will not cause any 
side-effects and Hudi will guarantee snapshot isolation.
-
-### Does AWS GLUE  support Hudi ?
-
-AWS Glue jobs can write, read and update Glue Data Catalog for hudi tables. In 
order to successfully integrate with Glue Data Catalog, you need to subscribe 
to one of the AWS provided Glue connectors named "AWS Glue Connector for Apache 
Hudi". Glue job needs to have "Use Glue data catalog as the Hive metastore" 
option ticked. Detailed steps with a sample scripts is available on this 
article provided by AWS - 
https://aws.amazon.com/blogs/big-data/writing-to-apache-hudi-tables-using-aws-gl
 [...]
-
-In case if your using either notebooks or Zeppelin through Glue dev-endpoints, 
your script might not be able to integrate with Glue DataCatalog when writing 
to hudi tables.
-
-### How to override Hudi jars in EMR?
-
-If you are looking to override Hudi jars in your EMR clusters one way to 
achieve this is by providing the Hudi jars through a bootstrap script. 
-Here are the example steps for overriding Hudi version 0.7.0 in EMR 0.6.2. 
-
-**Build Hudi Jars:**
-```shell script
-# Git clone
-git clone https://github.com/apache/hudi.git && cd hudi   
-
-# Get version 0.7.0
-git checkout --track origin/release-0.7.0
-
-# Build jars with spark 3.0.0 and scala 2.12 (since emr 6.2.0 uses spark 3 
which requires scala 2.12):
-mvn clean package -DskipTests -Dspark3  -Dscala-2.12 -T 30 
-```
-
-**Copy jars to s3:**
-These are the jars we are interested in after build completes. Copy them to a 
temp location first.
-
-```shell script
-mkdir -p ~/Downloads/hudi-jars
-cp packaging/hudi-hadoop-mr-bundle/target/hudi-hadoop-mr-bundle-0.7.0.jar 
~/Downloads/hudi-jars/
-cp packaging/hudi-hive-sync-bundle/target/hudi-hive-sync-bundle-0.7.0.jar 
~/Downloads/hudi-jars/
-cp packaging/hudi-spark-bundle/target/hudi-spark-bundle_2.12-0.7.0.jar 
~/Downloads/hudi-jars/
-cp 
packaging/hudi-timeline-server-bundle/target/hudi-timeline-server-bundle-0.7.0.jar
 ~/Downloads/hudi-jars/
-cp packaging/hudi-utilities-bundle/target/hudi-utilities-bundle_2.12-0.7.0.jar 
~/Downloads/hudi-jars/
-```
-
-Upload  all jars from ~/Downloads/hudi-jars/ to the s3 location 
s3://xxx/yyy/hudi-jars
-
-**Include Hudi jars as part of the emr bootstrap script:**
-Below script downloads Hudi jars from above s3 location. Use this script as 
part `bootstrap-actions` when launching the EMR cluster to install the jars in 
each node.
-
-```shell script
-#!/bin/bash
-sudo mkdir -p /mnt1/hudi-jars
-
-sudo aws s3 cp s3://xxx/yyy/hudi-jars /mnt1/hudi-jars --recursive
-
-# create symlinks
-cd /mnt1/hudi-jars
-sudo ln -sf hudi-hadoop-mr-bundle-0.7.0.jar hudi-hadoop-mr-bundle.jar
-sudo ln -sf hudi-hive-sync-bundle-0.7.0.jar hudi-hive-sync-bundle.jar
-sudo ln -sf hudi-spark-bundle_2.12-0.7.0.jar hudi-spark-bundle.jar
-sudo ln -sf hudi-timeline-server-bundle-0.7.0.jar 
hudi-timeline-server-bundle.jar
-sudo ln -sf hudi-utilities-bundle_2.12-0.7.0.jar hudi-utilities-bundle.jar
-```
-
-**Using the overriden jar in Hudi Streamer:**
-When invoking Hudi Streamer specify the above jar location as part of 
spark-submit command.
+By generating a commit time ahead of time, Hudi is able to stamp each record 
with effectively a transaction id that it's part of that commit enabling record 
level change tracking. This means, that even if that file is 
compacted/clustered ([they mean different things in 
Hudi](https://hudi.apache.org/docs/clustering#how-is-compaction-different-from-clustering))
 many times, in between incremental queries, we are able to [preserve history 
of the records](https://hudi.apache.org/blog/2023/05/ [...]
 
 ### Why partition fields are also stored in parquet files in addition to the 
partition path ?
 
 Hudi supports customizable partition values which could be a derived value of 
another field. Also, storing the partition value only as part of the field 
results in losing type information when queried by various query engines.
 
-### I am seeing lot of archive files. How do I control the number of archive 
commit files generated?
+### How do I configure Bloom filter (when Bloom/Global\_Bloom index is used)?
 
-Please note that in cloud stores that do not support log append operations, 
Hudi is forced to create new archive files to archive old metadata operations.  
You can increase hoodie.commits.archival.batch moving forward to increase the 
number of commits archived per archive file. In addition, you can increase the 
difference between the 2 watermark configurations : hoodie.keep.max.commits 
(default : 30) and hoodie.keep.min.commits (default : 20). This way, you can 
reduce the number of archi [...]
-
-### How do I configure Bloom filter (when Bloom/Global_Bloom index is used)? 
-
-Bloom filters are used in bloom indexes to look up the location of record keys 
in write path. Bloom filters are used only when the index type is chosen as 
“BLOOM” or “GLOBAL_BLOOM”. Hudi has few config knobs that users can use to tune 
their bloom filters.
+Bloom filters are used in bloom indexes to look up the location of record keys 
in write path. Bloom filters are used only when the index type is chosen as 
“BLOOM” or “GLOBAL\_BLOOM”. Hudi has few config knobs that users can use to 
tune their bloom filters.
 
 On a high level, hudi has two types of blooms: Simple and Dynamic.
 
@@ -506,61 +516,27 @@ Simple, as the name suggests, is simple. Size is 
statically allocated based on f
 
 `hoodie.index.bloom.num_entries` refers to the total number of entries per 
bloom filter, which refers to one file slice. Default value is 60000.
 
-`hoodie.index.bloom.fpp` refers to the false positive probability with the 
bloom filter. Default value: 1*10^-9.
+`hoodie.index.bloom.fpp` refers to the false positive probability with the 
bloom filter. Default value: 1\*10^-9.
 
-Size of the bloom filter depends on these two values. This is statically 
allocated and here is the formula that determines the size of bloom. Until the 
total number of entries added to the bloom is within the configured 
`hoodie.index.bloom.num_entries` value, the fpp will be honored. i.e. with 
default values of 60k and 1*10^-9, bloom filter serialized size = 430kb. But if 
more entries are added, then the false positive probability will not be 
honored. Chances that more false positives co [...]
+Size of the bloom filter depends on these two values. This is statically 
allocated and here is the formula that determines the size of bloom. Until the 
total number of entries added to the bloom is within the configured 
`hoodie.index.bloom.num_entries` value, the fpp will be honored. i.e. with 
default values of 60k and 1\*10^-9, bloom filter serialized size = 430kb. But 
if more entries are added, then the false positive probability will not be 
honored. Chances that more false positives c [...]
 
 Hudi suggests to have roughly 100 to 120 mb sized files for better query 
performance. So, based on the record size, one could determine how many records 
could fit into one data file.
 
-Lets say your data file max size is 128Mb and default avg record size is 1024 
bytes. Hence, roughly this translates to 130k entries per data file. For this 
config, you should set num_entries to ~130k.
+Lets say your data file max size is 128Mb and default avg record size is 1024 
bytes. Hence, roughly this translates to 130k entries per data file. For this 
config, you should set num\_entries to ~130k.
 
 Dynamic bloom filter:
 
 `hoodie.bloom.index.filter.type` : DYNAMIC
 
-This is an advanced version of the bloom filter which grows dynamically as the 
number of entries grows. So, users are expected to set two values wrt 
num_entries. `hoodie.index.bloom.num_entries` will determine the starting size 
of the bloom. `hoodie.bloom.index.filter.dynamic.max.entries` will determine 
the max size to which the bloom can grow upto. And fpp needs to be set similar 
to “Simple” bloom filter. Bloom size will be allotted based on the first config 
`hoodie.index.bloom.num_entr [...]
-
-### How to tune shuffle parallelism of Hudi jobs ?
-
-First, let's understand what the term parallelism means in the context of Hudi 
jobs. For any Hudi job using Spark, parallelism equals to the number of spark 
partitions that should be generated for a particular stage in the DAG. To 
understand more about spark partitions, read this 
[article](https://www.dezyre.com/article/how-data-partitioning-in-spark-helps-achieve-more-parallelism/297).
 In spark, each spark partition is mapped to a spark task that can be executed 
on an executor. Typicall [...]
-
-(*Spark Application → N Spark Jobs → M Spark Stages → T Spark Tasks*) on (*E 
executors with C cores*)
-
-A spark application can be given E number of executors to run the spark 
application on. Each executor might hold 1 or more spark cores. Every spark 
task will require atleast 1 core to execute, so imagine T number of tasks to be 
done in Z time depending on C cores. The higher C, Z is smaller.
-
-With this understanding, if you want your DAG stage to run faster, *bring T as 
close or higher to C*. Additionally, this parallelism finally controls the 
number of output files you write using a Hudi based job. Let's understand the 
different kinds of knobs available:
-
-[BulkInsertParallelism](https://hudi.apache.org/docs/configurations#hoodiebulkinsertshuffleparallelism)
 → This is used to control the parallelism with which output files will be 
created by a Hudi job. The higher this parallelism, the more number of tasks 
are created and hence the more number of output files will eventually be 
created. Even if you define 
[parquet-max-file-size](https://hudi.apache.org/docs/configurations#hoodieparquetmaxfilesize)
 to be of a high value, if you make paralle [...]
-
-[Upsert](https://hudi.apache.org/docs/configurations#hoodieupsertshuffleparallelism)
 / [Insert 
Parallelism](https://hudi.apache.org/docs/configurations#hoodieinsertshuffleparallelism)
 → This is used to control how fast the read process should be when reading 
data into the job. Find more details 
[here](https://hudi.apache.org/docs/configurations/).  
-
-### INT96, INT64 and timestamp compatibility
-
-https://hudi.apache.org/docs/configurations#hoodiedatasourcehive_syncsupport_timestamp
-
-### How to convert an existing COW table to MOR? 
-
-All you need to do is to edit the table type property in hoodie.properties 
(located at hudi_table_path/.hoodie/hoodie.properties). 
-But manually changing it will result in checksum errors. So, we have to go via 
hudi-cli. 
-
-1. Copy existing hoodie.properties to a new location. 
-2. Edit table type to MERGE_ON_READ
-3. launch hudi-cli 
-   1. connect --path hudi_table_path
-   2. repair overwrite-hoodie-props --new-props-file new_hoodie.properties
-
-### Can I get notified when new commits happen in my Hudi table?
-
-Yes. Hudi provides the ability to post a callback notification about a write 
commit. You can use a http hook or choose to 
-be notified via a Kafka/pulsar topic or plug in your own implementation to get 
notified. Please refer 
[here](https://hudi.apache.org/docs/next/writing_data/#commit-notifications)
-for details
+This is an advanced version of the bloom filter which grows dynamically as the 
number of entries grows. So, users are expected to set two values wrt 
num\_entries. `hoodie.index.bloom.num_entries` will determine the starting size 
of the bloom. `hoodie.bloom.index.filter.dynamic.max.entries` will determine 
the max size to which the bloom can grow upto. And fpp needs to be set similar 
to “Simple” bloom filter. Bloom size will be allotted based on the first config 
`hoodie.index.bloom.num_ent [...]
 
 ### How do I verify datasource schema reconciliation in Hudi?
 
 With Hudi you can reconcile schema, meaning you can apply target table schema 
on your incoming data, so if there's a missing field in your batch it'll be 
injected null value. You can enable schema reconciliation using 
[hoodie.datasource.write.reconcile.schema](https://hudi.apache.org/docs/configurations/#hoodiedatasourcewritereconcileschema)
 config.
 
 Example how schema reconciliation works with Spark:
-```python
+
+```scala
 hudi_options = {
     'hoodie.table.name': "test_recon1",
     'hoodie.datasource.write.recordkey.field': 'uuid',
@@ -594,88 +570,92 @@ spark.sql("select * from hudi.test_recon1;").show()
 
 After first write:
 
-|_hoodie_commit_time|_hoodie_commit_seqno|_hoodie_record_key|_hoodie_partition_path|
   _hoodie_file_name|              Url| ts|uuid|
-|-------------------|--------------------|------------------|----------------------|--------------------|-----------------|---|----|
-|  20220622204044318|20220622204044318...|                 1|                  
    |890aafc0-d897-44d...|hudi.apache.com|  1|   1|
+| \_hoodie\_commit\_time | \_hoodie\_commit\_seqno | \_hoodie\_record\_key | 
\_hoodie\_partition\_path | \_hoodie\_file\_name | Url | ts | uuid |
+| ---| ---| ---| ---| ---| ---| ---| --- |
+| 20220622204044318 | 20220622204044318... | 1 |  | 890aafc0-d897-44d... | 
[hudi.apache.com](http://hudi.apache.com) | 1 | 1 |
 
 After the second write:
 
-|_hoodie_commit_time|_hoodie_commit_seqno|_hoodie_record_key|_hoodie_partition_path|
   _hoodie_file_name|              Url| ts|uuid|
-|-------------------|--------------------|------------------|----------------------|--------------------|-----------------|---|----|
-|  20220622204044318|20220622204044318...|                 1|                  
    |890aafc0-d897-44d...|hudi.apache.com|  1|   1|
-|  20220622204208997|20220622204208997...|                 2|                  
    |890aafc0-d897-44d...|             null|  1|   2|
+| \_hoodie\_commit\_time | \_hoodie\_commit\_seqno | \_hoodie\_record\_key | 
\_hoodie\_partition\_path | \_hoodie\_file\_name | Url | ts | uuid |
+| ---| ---| ---| ---| ---| ---| ---| --- |
+| 20220622204044318 | 20220622204044318... | 1 |  | 890aafc0-d897-44d... | 
[hudi.apache.com](http://hudi.apache.com) | 1 | 1 |
+| 20220622204208997 | 20220622204208997... | 2 |  | 890aafc0-d897-44d... | 
null | 1 | 2 |
 
+### Can I change keygenerator for an existing table?
 
-### I see two different records for the same record key value, each record key 
with a different timestamp format. How is this possible?
+No. There are small set of properties that cannot change once chosen. 
KeyGenerator is one among them. 
[Here](https://github.com/apache/hudi/blob/3f37d4fb08169c95930f9cc32389abf4e5cd5551/hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieWriterUtils.scala#L128)
 is a code referecne where we
 
-This is a known issue with enabling row-writer for bulk_insert operation. When 
you do a bulk_insert followed by another 
-write operation such as upsert/insert this might be observed for timestamp 
fields specifically. For example, bulk_insert might produce
-timestamp `2016-12-29 09:54:00.0` for record key whereas non bulk_insert write 
operation might produce a long value like
-`1483023240000000` for the record key thus creating two different records. To 
fix this, starting 0.10.1 a new config 
[hoodie.datasource.write.keygenerator.consistent.logical.timestamp.enabled](https://hudi.apache.org/docs/configurations/#hoodiedatasourcewritekeygeneratorconsistentlogicaltimestampenabled)
 
-is introduced to bring consistency irrespective of whether row writing is 
enabled on not. However, for the sake of 
-backwards compatibility and not breaking existing pipelines, this config is 
set to false by default and will have to be enabled explicitly.
+validate the properties.
 
+### Is Hudi JVM dependent? Does Hudi leverage Java specific serialization?
 
-### Can I switch from one index type to another without having to rewrite the 
entire table?
+Hudi was not originally designed as a database layer that would fit under the 
various big data query engines, that were painfully hard to integrate with 
(Spark did not have DataSet/DataSource APIs, Trino was still Presto, Presto SPI 
was still budding, Hive storage handlers were just out). Popular engines 
including Spark, Flink, Presto, Trino, and Athena do not have issues 
integrating with Hudi as they are all based on JVM, and access access to 
Timeline, Metadata table are well-abstracted [...]
 
-It should be okay to switch between Bloom index and Simple index as long as 
they are not global. 
-Moving from global to non-global and vice versa may not work. Also switching 
between Hbase (gloabl index) and regular bloom might not work.
+Since it was not thought of as a "format", the focus on the APIs for such 
lower level integrations and documenting the serialized bytes has been 
historically inadequate. However, with some understanding of the serialization, 
looking beyond the APIs used and focus on what the serialized bytes are, its 
possible to integrate Hudi from outside the JVM. For e.g Bloom filters are 
serialized as hex strings, from byte arrays/primitive types, and should be 
**readable cross language**. The Hudi Lo [...]
 
-### How can I resolve the NoSuchMethodError from HBase when using Hudi with 
metadata table on HDFS?
-From 0.11.0 release, we have upgraded the HBase version to 2.4.9, which is 
released based on Hadoop 2.x.  Hudi's metadata
-table uses HFile as the base file format, relying on the HBase library.  When 
enabling metadata table in a Hudi table on
-HDFS using Hadoop 3.x, NoSuchMethodError can be thrown due to compatibility 
issues between Hadoop 2.x and 3.x.
-To address this, here's the workaround:
+**_Note_**: _In a recent release the delete block keys were unintentionally 
serialized as kryo, and is being fixed in the 0.14 release. Thankfully, since 
Hudi’s log blocks and format are versioned, when the file slice is compacted 
things return to normal._
 
-(1) Download HBase source code from `https://github.com/apache/hbase`;
+## Integrations
 
-(2) Switch to the source code of 2.4.9 release with the tag `rel/2.4.9`:
-```shell
-git checkout rel/2.4.9
-```
+### Does AWS GLUE support Hudi ?
 
-(3) Package a new version of HBase 2.4.9 with Hadoop 3 version: 
-```shell
-mvn clean install -Denforcer.skip -DskipTests -Dhadoop.profile=3.0 
-Psite-install-step
-```
+AWS Glue jobs can write, read and update Glue Data Catalog for hudi tables. In 
order to successfully integrate with Glue Data Catalog, you need to subscribe 
to one of the AWS provided Glue connectors named "AWS Glue Connector for Apache 
Hudi". Glue job needs to have "Use Glue data catalog as the Hive metastore" 
option ticked. Detailed steps with a sample scripts is available on this 
article provided by AWS - 
[https://aws.amazon.com/blogs/big-data/writing-to-apache-hudi-tables-using-aws-g
 [...]
 
-(4) Package Hudi again.
+In case if your using either notebooks or Zeppelin through Glue dev-endpoints, 
your script might not be able to integrate with Glue DataCatalog when writing 
to hudi tables.
 
-### How can I resolve the RuntimeException saying `hbase-default.xml file 
seems to be for an older version of HBase`?
+### How to override Hudi jars in EMR?
 
-This usually happens when there are other HBase libs provided by the runtime 
environment in the classpath, such as
-Cloudera CDP stack, causing the conflict.  To get around the RuntimeException, 
you can set the
-`hbase.defaults.for.version.skip` to `true` in the `hbase-site.xml` 
configuration file, e.g., overwriting the config
-within the Cloudera manager.
+If you are looking to override Hudi jars in your EMR clusters one way to 
achieve this is by providing the Hudi jars through a bootstrap script.
 
-### How can I find the average record size in a commit?
-The `commit showpartitons` command in [HUDI 
CLI](https://hudi.apache.org/docs/cli) will show both "bytes written" and 
-"records inserted." Divide the bytes written by records inserted to find the 
average size. Note that this answer assumes 
-metadata overhead is negligible. For a small dataset (such as 5 columns, 100 
records) this will not be the case.
+Here are the example steps for overriding Hudi version 0.7.0 in EMR 0.6.2.
 
-### How can I resolve the IllegalArgumentException saying `Partitions must be 
in the same table` when attempting to sync to a metastore?
+**Build Hudi Jars:**
 
-This will occur when capital letters are used in the table name. Metastores 
such as Hive automatically convert table names
-to lowercase. While we allow capitalization on Hudi tables, if you would like 
to use a metastore you may be required to
-use all lowercase letters. More details on how this issue presents can be 
found [here](https://github.com/apache/hudi/issues/6832).
+```bash
+# Git clone
+git clone https://github.com/apache/hudi.git && cd hudi   
 
-### How can I reduce table versions created by Hudi in AWS Glue Data Catalog/ 
metastore?
-With each commit, Hudi creates a new table version in the metastore. This can 
be reduced by setting the option 
-[hoodie.datasource.meta_sync.condition.sync](https://hudi.apache.org/docs/configurations#hoodiedatasourcemeta_syncconditionsync)
 to true.
-This will ensure that hive sync is triggered on schema or partitions changes.
+# Get version 0.7.0
+git checkout --track origin/release-0.7.0
 
-### Can I change keygenerator for an existing table?
-No. There are small set of properties that cannot change once chosen. 
KeyGenerator is one among them. 
[Here](https://github.com/apache/hudi/blob/3f37d4fb08169c95930f9cc32389abf4e5cd5551/hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieWriterUtils.scala#L128)
 is a code referecne where we
-validate the properties.
+# Build jars with spark 3.0.0 and scala 2.12 (since emr 6.2.0 uses spark 3 
which requires scala 2.12):
+mvn clean package -DskipTests -Dspark3  -Dscala-2.12 -T 30 
+```
+
+**Copy jars to s3:**
+
+These are the jars we are interested in after build completes. Copy them to a 
temp location first.
+
+```bash
+mkdir -p ~/Downloads/hudi-jars
+cp packaging/hudi-hadoop-mr-bundle/target/hudi-hadoop-mr-bundle-0.7.0.jar 
~/Downloads/hudi-jars/
+cp packaging/hudi-hive-sync-bundle/target/hudi-hive-sync-bundle-0.7.0.jar 
~/Downloads/hudi-jars/
+cp packaging/hudi-spark-bundle/target/hudi-spark-bundle_2.12-0.7.0.jar 
~/Downloads/hudi-jars/
+cp 
packaging/hudi-timeline-server-bundle/target/hudi-timeline-server-bundle-0.7.0.jar
 ~/Downloads/hudi-jars/
+cp packaging/hudi-utilities-bundle/target/hudi-utilities-bundle_2.12-0.7.0.jar 
~/Downloads/hudi-jars/
+```
 
+Upload all jars from ~/Downloads/hudi-jars/ to the s3 location 
s3://xxx/yyy/hudi-jars
 
-## Contributing to FAQ
+**Include Hudi jars as part of the emr bootstrap script:**
+
+Below script downloads Hudi jars from above s3 location. Use this script as 
part `bootstrap-actions` when launching the EMR cluster to install the jars in 
each node.
+
+```bash
+#!/bin/bash
+sudo mkdir -p /mnt1/hudi-jars
+
+sudo aws s3 cp s3://xxx/yyy/hudi-jars /mnt1/hudi-jars --recursive
 
-A good and usable FAQ should be community-driven and crowd source 
questions/thoughts across everyone.
+# create symlinks
+cd /mnt1/hudi-jars
+sudo ln -sf hudi-hadoop-mr-bundle-0.7.0.jar hudi-hadoop-mr-bundle.jar
+sudo ln -sf hudi-hive-sync-bundle-0.7.0.jar hudi-hive-sync-bundle.jar
+sudo ln -sf hudi-spark-bundle_2.12-0.7.0.jar hudi-spark-bundle.jar
+sudo ln -sf hudi-timeline-server-bundle-0.7.0.jar 
hudi-timeline-server-bundle.jar
+sudo ln -sf hudi-utilities-bundle_2.12-0.7.0.jar hudi-utilities-bundle.jar
+```
 
-You can improve the FAQ by the following processes
+**Using the overriden jar in Deltastreamer:**
 
-- Raise a PR to spot inaccuracies, typos on this page and leave suggestions.
-- Raise a PR to propose new questions with answers.
-- Lean towards making it very understandable and simple, and heavily link to 
parts of documentation as needed
-- One committer on the project will review new questions and incorporate them 
upon review.
+When invoking DeltaStreamer specify the above jar location as part of 
spark-submit command.
diff --git a/website/docs/querying_data.md b/website/docs/querying_data.md
index f0f72f267ee..5b2614d2ad6 100644
--- a/website/docs/querying_data.md
+++ b/website/docs/querying_data.md
@@ -27,7 +27,7 @@ classpath of drivers and executors using `--jars` option. 
Alternatively, hudi-sp
 Retrieve the data table at the present point in time.
 
 ```scala
-val hudiIncQueryDF = spark
+val hudiSnapshotQueryDF = spark
      .read
      .format("hudi")
      .option(DataSourceReadOptions.QUERY_TYPE_OPT_KEY(), 
DataSourceReadOptions.QUERY_TYPE_SNAPSHOT_OPT_VAL())
diff --git a/website/docs/troubleshooting.md b/website/docs/troubleshooting.md
index 1637e84fe86..6398cfc7245 100644
--- a/website/docs/troubleshooting.md
+++ b/website/docs/troubleshooting.md
@@ -1,197 +1,222 @@
 ---
 title: Troubleshooting
 keywords: [hudi, troubleshooting]
+toc_min_heading_level: 2
+toc_max_heading_level: 5
 last_modified_at: 2021-08-18T15:59:57-04:00
 ---
 
-## Troubleshooting
+For performance related issues, please refer to the [tuning 
guide](https://hudi.apache.org/docs/tuning-guide)
 
-Section below generally aids in debugging Hudi failures. Off the bat, the 
following metadata is added to every record to help triage  issues easily using 
standard Hadoop SQL engines (Hive/PrestoDB/Spark)
+### Writing Tables
 
-- **_hoodie_record_key** - Treated as a primary key within each DFS partition, 
basis of all updates/inserts
-- **_hoodie_commit_time** - Last commit that touched this record
-- **_hoodie_file_name** - Actual file name containing the record (super useful 
to triage duplicates)
-- **_hoodie_partition_path** - Path from basePath that identifies the 
partition containing this record
+#### org.apache.parquet.io.InvalidRecordException: Parquet/Avro schema 
mismatch: Avro field 'col1' not found
+It is recommended that schema should evolve in [backwards compatible 
way](https://docs.confluent.io/platform/current/schema-registry/avro.html) 
while using Hudi. Please refer here for more information on avro schema 
resolution - 
[https://avro.apache.org/docs/1.8.2/spec.html](https://avro.apache.org/docs/1.8.2/spec.html).
 This error generally occurs when the schema has evolved in backwards 
**incompatible** way by deleting some column 'col1' and we are trying to update 
some record in parqu [...]
 
-For performance related issues, please refer to the [tuning 
guide](https://hudi.apache.org/docs/tuning-guide)
+The fix for this is to try and create uber schema using all the schema 
versions evolved so far for the concerned event and use this uber schema as the 
target schema. One of the good approaches can be fetching schema from hive 
metastore and merging it with the current schema.
 
+Sample stacktrace where a field named "toBeDeletedStr" was omitted from new 
batch of updates : 
[https://gist.github.com/nsivabalan/cafc53fc9a8681923e4e2fa4eb2133fe](https://gist.github.com/nsivabalan/cafc53fc9a8681923e4e2fa4eb2133fe)
 
-### Missing records
+#### java.lang.UnsupportedOperationException: 
org.apache.parquet.avro.AvroConverters$FieldIntegerConverter
 
-Please check if there were any write errors using the admin commands above, 
during the window at which the record could have been written.
-If you do find errors, then the record was not actually written by Hudi, but 
handed back to the application to decide what to do with it.
+This error will again occur due to schema evolutions in non-backwards 
compatible way. Basically there is some incoming update U for a record R which 
is already written to your Hudi dataset in the concerned parquet file. R 
contains field F which is having certain data type, let us say long. U has the 
same field F with updated data type of int type. Such incompatible data type 
conversions are not supported by Parquet FS.
 
-### Duplicates
+For such errors, please try to ensure only valid data type conversions are 
happening in your primary data source from where you are trying to ingest.
 
-First of all, please confirm if you do indeed have duplicates **AFTER** 
ensuring the query is accessing the Hudi table [properly](/docs/querying_data) .
+Sample stacktrace when trying to evolve a field from Long type to Integer type 
with Hudi : 
[https://gist.github.com/nsivabalan/0d81cd60a3e7a0501e6a0cb50bfaacea](https://gist.github.com/nsivabalan/0d81cd60a3e7a0501e6a0cb50bfaacea)
+ 
+#### SchemaCompatabilityException: Unable to validate the rewritten record
 
-- If confirmed, please use the metadata fields above, to identify the physical 
files & partition files containing the records .
-- If duplicates span files across partitionpath, then this means your 
application is generating different partitionPaths for same recordKey, Please 
fix your app
-- if duplicates span multiple files within the same partitionpath, please 
engage with mailing list. This should not happen. You can use the `records 
deduplicate` command to fix your data.
+This can possibly occur if your schema has some non-nullable field whose value 
is not present or is null. It is recommended to evolve schema in backwards 
compatible ways. In essence, this means either have every newly added field as 
nullable or define default values for every new field. See [Schema 
Evolution](https://hudi.apache.org/docs/schema_evolution) docs for more.
 
-### Spark failures {#spark-ui}
+#### INT96, INT64 and timestamp compatibility
 
-Typical upsert() DAG looks like below. Note that Hudi client also caches 
intermediate RDDs to intelligently profile workload and size files and spark 
parallelism.
-Also Spark UI shows sortByKey twice due to the probe job also being shown, 
nonetheless its just a single sort.
+[https://hudi.apache.org/docs/configurations#hoodiedatasourcehive\_syncsupport\_timestamp](https://hudi.apache.org/docs/configurations#hoodiedatasourcehive_syncsupport_timestamp)
 
-<figure>
-    <img className="docimage" 
src={require("/assets/images/hudi_upsert_dag.png").default} 
alt="hudi_upsert_dag.png"  />
-</figure>
+#### I am seeing lot of archive files. How do I control the number of archive 
commit files generated?
 
-At a high level, there are two steps
+Please note that in cloud stores that do not support log append operations, 
Hudi is forced to create new archive files to archive old metadata operations. 
+You can increase `hoodie.commits.archival.batch` moving forward to increase 
the number of commits archived per archive file. 
+In addition, you can increase the difference between the 2 watermark 
configurations : `hoodie.keep.max.commits` (default : 30) 
+and `hoodie.keep.min.commits` (default : 20). This way, you can reduce the 
number of archive files created and also 
+at the same time increase the number of metadata archived per archive file. 
Note that post 0.7.0 release where we are 
+adding consolidated Hudi metadata 
([RFC-15](https://cwiki.apache.org/confluence/display/HUDI/RFC+-+15%3A+HUDI+File+Listing+Improvements)),
 
+the follow up work would involve re-organizing archival metadata so that we 
can do periodic compactions to control 
+file-sizing of these archive files.
 
-**Index Lookup to identify files to be changed**
+#### How can I resolve the NoSuchMethodError from HBase when using Hudi with 
metadata table on HDFS?
 
-- Job 1 : Triggers the input data read, converts to HoodieRecord object and 
then stops at obtaining a spread of input records to target partition paths
-- Job 2 : Load the set of file names which we need check against
-- Job 3  & 4 : Actual lookup after smart sizing of spark join parallelism, by 
joining RDDs in 1 & 2 above
-- Job 5 : Have a tagged RDD of recordKeys with locations
+From 0.11.0 release, we have upgraded the HBase version to 2.4.9, which is 
released based on Hadoop 2.x. Hudi's metadata
 
-**Performing the actual writing of data**
+table uses HFile as the base file format, relying on the HBase library. When 
enabling metadata table in a Hudi table on
 
-- Job 6 : Lazy join of incoming records against recordKey, location to provide 
a final set of HoodieRecord which now contain the information about which 
file/partitionpath they are found at (or null if insert). Then also profile the 
workload again to determine sizing of files
-- Job 7 : Actual writing of data (update + insert + insert turned to updates 
to maintain file size)
+HDFS using Hadoop 3.x, NoSuchMethodError can be thrown due to compatibility 
issues between Hadoop 2.x and 3.x.
 
-Depending on the exception source (Hudi/Spark), the above knowledge of the DAG 
can be used to pinpoint the actual issue. The most often encountered failures 
result from YARN/DFS temporary failures.
-In the future, a more sophisticated debug/management UI would be added to the 
project, that can help automate some of this debugging.
+To address this, here's the workaround:
 
-### Common Issues
+(1) Download HBase source code from 
[`https://github.com/apache/hbase`](https://github.com/apache/hbase);
 
-This section lists down all the common issues that users have faced while 
using Hudi. [Contributions](https://hudi.apache.org/contribute/get-involved) 
are always welcome to improve this section.
+(2) Switch to the source code of 2.4.9 release with the tag `rel/2.4.9`:
 
-#### Writing Data
+```bash
+git checkout rel/2.4.9
+```
 
-##### Caused by: org.apache.parquet.io.InvalidRecordException: Parquet/Avro 
schema mismatch: Avro field 'col1' not found
+(3) Package a new version of HBase 2.4.9 with Hadoop 3 version:
 
-It is recommended that schema should evolve in [backwards compatible 
way](https://docs.confluent.io/platform/current/schema-registry/avro.html) 
while using Hudi. Please refer here for more information on avro schema 
resolution - https://avro.apache.org/docs/1.8.2/spec.html. This error generally 
occurs when the schema has evolved in backwards **incompatible** way by 
deleting some column 'col1' and we are trying to update some record in parquet 
file which has alredy been written with previ [...]
+```bash
+mvn clean install -Denforcer.skip -DskipTests -Dhadoop.profile=3.0 
-Psite-install-step
+```
 
-The fix for this is to try and create uber schema using all the schema 
versions evolved so far for the concerned event and use this uber schema as the 
target schema. One of the good approaches can be fetching schema from hive 
metastore and merging it with the current schema.
+(4) Package Hudi again.
 
-Sample stacktrace where a field named "toBeDeletedStr" was omitted from new 
batch of updates : 
https://gist.github.com/nsivabalan/cafc53fc9a8681923e4e2fa4eb2133fe
+#### How can I resolve the RuntimeException saying `hbase-default.xml file 
seems to be for an older version of HBase`?
 
-##### Caused by: java.lang.UnsupportedOperationException: 
org.apache.parquet.avro.AvroConverters$FieldIntegerConverter
+This usually happens when there are other HBase libs provided by the runtime 
environment in the classpath, such as
 
-This error will again occur due to schema evolutions in non-backwards 
compatible way. Basically there is some incoming update U for a record R which 
is already written to your Hudi dataset in the concerned parquet file. R 
contains field F which is having certain data type, let us say long. U has the 
same field F with updated data type of int type. Such incompatible data type 
conversions are not supported by Parquet FS.
+Cloudera CDP stack, causing the conflict. To get around the RuntimeException, 
you can set the
 
-For such errors, please try to ensure only valid data type conversions are 
happening in your primary data source from where you are trying to ingest.
+`hbase.defaults.for.version.skip` to `true` in the `hbase-site.xml` 
configuration file, e.g., overwriting the config
+
+within the Cloudera manager.
+
+#### I see two different records for the same record key value, each record 
key with a different timestamp format. How is this possible?
 
-Sample stacktrace when trying to evolve a field from Long type to Integer type 
with Hudi : https://gist.github.com/nsivabalan/0d81cd60a3e7a0501e6a0cb50bfaacea
+This is a known issue with enabling row-writer for bulk_insert operation. When 
you do a bulk_insert followed by another
+write operation such as upsert/insert this might be observed for timestamp 
fields specifically. For example, bulk_insert might produce
+timestamp `2016-12-29 09:54:00.0` for record key whereas non bulk_insert write 
operation might produce a long value like
+`1483023240000000` for the record key thus creating two different records. To 
fix this, starting 0.10.1 a new config 
[hoodie.datasource.write.keygenerator.consistent.logical.timestamp.enabled](https://hudi.apache.org/docs/configurations/#hoodiedatasourcewritekeygeneratorconsistentlogicaltimestampenabled)
+is introduced to bring consistency irrespective of whether row writing is 
enabled on not. However, for the sake of
+backwards compatibility and not breaking existing pipelines, this config is 
set to false by default and will have to be enabled explicitly.
 
-##### org.apache.hudi.exception.SchemaCompatabilityException: Unable to 
validate the rewritten record &lt;record&gt; against schema &lt;schema&gt; at 
org.apache.hudi.common.util.HoodieAvroUtils.rewrite(HoodieAvroUtils.java:215)
 
-This can possibly occur if your schema has some non-nullable field whose value 
is not present or is null. It is recommended to evolve schema in backwards 
compatible ways. In essence, this means either have every newly added field as 
nullable or define default values for every new field. In case if you are 
relying on default value for your field, as of Hudi version 0.5.1, this is not 
handled.
 
-##### hudi consumes too much space in a temp folder while upsert
+### Querying Tables
 
-When upsert large input data, hudi will spills part of input data to disk when 
reach the max memory for merge. if there is enough memory, please increase 
spark executor's memory and  "hoodie.memory.merge.fraction" option, for example
+#### How can I now query the Hudi dataset I just wrote?
 
-```java
-option("hoodie.memory.merge.fraction", "0.8")
+Unless Hive sync is enabled, the dataset written by Hudi using one of the 
methods above can simply be queries via the Spark datasource like any other 
source.
+
+```scala
+val hudiSnapshotQueryDF = spark
+     .read()
+     .format("hudi")
+     .option(DataSourceReadOptions.QUERY_TYPE_OPT_KEY(), 
DataSourceReadOptions.QUERY_TYPE_SNAPSHOT_OPT_VAL())
+     .load(basePath) 
+val hudiIncQueryDF = spark.read().format("hudi")
+     .option(DataSourceReadOptions.VIEW_TYPE_OPT_KEY(), 
DataSourceReadOptions.VIEW_TYPE_INCREMENTAL_OPT_VAL())
+     .option(DataSourceReadOptions.BEGIN_INSTANTTIME_OPT_KEY(), 
<beginInstantTime>)
+     .load(basePath);
 ```
 
-#### Ingestion
+if Hive Sync is enabled in the 
[deltastreamer](https://github.com/apache/hudi/blob/d3edac4612bde2fa9deca9536801dbc48961fb95/docker/demo/sparksql-incremental.commands#L50)
 tool or 
[datasource](https://hudi.apache.org/docs/configurations#hoodiedatasourcehive_syncenable),
 the dataset is available in Hive as a couple of tables, that can now be read 
using HiveQL, Presto or SparkSQL. See 
[here](https://hudi.apache.org/docs/querying_data/) for more.
 
-##### Caused by: java.io.EOFException: Received -1 when reading from channel, 
socket has likely been closed. at kafka.utils.Utils$.read(Utils.scala:381) at 
kafka.network.BoundedByteBufferReceive.readFrom(BoundedByteBufferReceive.scala:54)
+### Data Quality Issues
 
-This might happen if you are ingesting from Kafka source, your cluster is ssl 
enabled by default and you are using some version of Hudi older than 0.5.1. 
Previous versions of Hudi were using spark-streaming-kafka-0-8 library. With 
the release of 0.5.1 version of Hudi, spark was upgraded to 2.4.4 and 
spark-streaming-kafka library was upgraded to spark-streaming-kafka-0-10. SSL 
support was introduced from spark-streaming-kafka-0-10. Please see here for 
reference.
+Section below generally aids in debugging Hudi failures. Off the bat, the 
following metadata is added to every record to help triage issues easily using 
standard Hadoop SQL engines (Hive/PrestoDB/Spark)
 
-The workaround can be either use Kafka cluster which is not ssl enabled, else 
upgrade Hudi version to at least 0.5.1 or spark-streaming-kafka library to 
spark-streaming-kafka-0-10.
+* **\_hoodie\_record\_key** - Treated as a primary key within each DFS 
partition, basis of all updates/inserts
+* **\_hoodie\_commit\_time** - Last commit that touched this record
+* **\_hoodie_commit_seqno** - This field contains a unique sequence number for 
each record within each transaction.
+* **\_hoodie\_file\_name** - Actual file name containing the record (super 
useful to triage duplicates)
+* **\_hoodie\_partition\_path** - Path from basePath that identifies the 
partition containing this record
 
-##### Exception in thread "main" org.apache.kafka.common.KafkaException: 
Failed to construct kafka consumer
-##### Caused by: java.lang.IllegalArgumentException: Could not find a 
'KafkaClient' entry in the JAAS configuration. System property 
'java.security.auth.login.config' is not set
+#### Missing records
 
-This might happen when you are trying to ingest from ssl enabled kafka source 
and your setup is not able to read jars.conf file and its properties. To fix 
this, you need to pass the required property as part of your spark-submit 
command something like
+Please check if there were any write errors using the admin commands, during 
the window at which the record could have been written.
 
-```java
---files jaas.conf,failed_tables.json --conf 
'spark.driver.extraJavaOptions=-Djava.security.auth.login.config=jaas.conf' 
--conf 
'spark.executor.extraJavaOptions=-Djava.security.auth.login.config=jaas.conf'
-```
+If you do find errors, then the record was not actually written by Hudi, but 
handed back to the application to decide what to do with it.
 
-##### com.uber.hoodie.exception.HoodieException: created_at(Part -created_at) 
field not found in record. Acceptable fields were :[col1, col2, col3, id, name, 
dob, created_at, updated_at]
+#### Duplicates
 
-Happens generally when field marked as recordKey or partitionKey is not 
present in some incoming record. Please cross verify your incoming record once.
+First of all, please confirm if you do indeed have duplicates **AFTER** 
ensuring the query is accessing the Hudi table 
[properly](https://hudi.apache.org/docs/querying_data/) .
 
-##### If it is possible to use a nullable field that contains null records as 
a primary key when creating hudi table
+*   If confirmed, please use the metadata fields above, to identify the 
physical files & partition files containing the records .
+*   If duplicates span files across partitionpath, then this means your 
application is generating different partitionPaths for same recordKey, Please 
fix your app
+*   if duplicates span multiple files within the same partitionpath, please 
engage with mailing list. This should not happen. You can use the `records 
deduplicate` command to fix your data.
 
-No, will throw HoodieKeyException
+### Ingestion
 
-```scala
-Caused by: org.apache.hudi.exception.HoodieKeyException: recordKey value: 
"null" for field: "name" cannot be null or empty.
-  at 
org.apache.hudi.keygen.SimpleKeyGenerator.getKey(SimpleKeyGenerator.java:58)
-  at 
org.apache.hudi.HoodieSparkSqlWriter$$anonfun$1.apply(HoodieSparkSqlWriter.scala:104)
-  at 
org.apache.hudi.HoodieSparkSqlWriter$$anonfun$1.apply(HoodieSparkSqlWriter.scala:100)
+#### java.io.EOFException: Received -1 when reading from channel, socket has 
likely been closed. at 
[kafka.utils.Utils$.read](http://kafka.utils.Utils$.read)(Utils.scala:381) at 
kafka.network.BoundedByteBufferReceive.readFrom(BoundedByteBufferReceive.scala:54)
+
+This might happen if you are ingesting from Kafka source, your cluster is ssl 
enabled by default and you are using some version of Hudi older than 0.5.1. 
Previous versions of Hudi were using spark-streaming-kafka-0-8 library. With 
the release of 0.5.1 version of Hudi, spark was upgraded to 2.4.4 and 
spark-streaming-kafka library was upgraded to spark-streaming-kafka-0-10. SSL 
support was introduced from spark-streaming-kafka-0-10. Please see here for 
reference.
+
+The workaround can be either use Kafka cluster which is not ssl enabled, else 
upgrade Hudi version to at least 0.5.1 or spark-streaming-kafka library to 
spark-streaming-kafka-0-10.
+
+#### java.lang.IllegalArgumentException: Could not find a 'KafkaClient' entry 
in the JAAS configuration. System property 'java.security.auth.login.config' is 
not set
+
+This might happen when you are trying to ingest from ssl enabled kafka source 
and your setup is not able to read jars.conf file and its properties. To fix 
this, you need to pass the required property as part of your spark-submit 
command something like
+
+```plain
+--files jaas.conf,failed_tables.json --conf 
'spark.driver.extraJavaOptions=-Djava.security.auth.login.config=jaas.conf' 
--conf 
'spark.executor.extraJavaOptions=-Djava.security.auth.login.config=jaas.conf'
 ```
 
-#### IOException: Write end dead or CIRCULAR REFERENCE while writing to GCS 
+#### IOException: Write end dead or CIRCULAR REFERENCE while writing to GCS
+
 If you encounter below stacktrace, please set the spark config as suggested 
below.
-```
+
+```plain
 --conf 'spark.hadoop.fs.gs.outputstream.pipe.type=NIO_CHANNEL_PIPE'
 ```
 
-```
+```plain
  at 
org.apache.hudi.io.storage.HoodieAvroParquetWriter.close(HoodieAvroParquetWriter.java:84)
        Suppressed: java.io.IOException: Upload failed for 
'gs://bucket/b0ee4274-5193-4a26-bcff-d60654fd7b24-0_0-42-671_20230228055305900.parquet'
-               at 
com.google.cloud.hadoop.repackaged.gcs.com.google.cloud.hadoop.util.BaseAbstractGoogleAsyncWriteChannel.waitForCompletionAndThrowIfUploadFailed(BaseAbstractGoogleAsyncWriteChannel.java:260)
-               at 
com.google.cloud.hadoop.repackaged.gcs.com.google.cloud.hadoop.util.BaseAbstractGoogleAsyncWriteChannel.write(BaseAbstractGoogleAsyncWriteChannel.java:121)
-               at 
java.base/java.nio.channels.Channels.writeFullyImpl(Channels.java:74)
-               at 
java.base/java.nio.channels.Channels.writeFully(Channels.java:97)
-               at 
java.base/java.nio.channels.Channels$1.write(Channels.java:172)
-               at 
java.base/java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:81)
-               at 
java.base/java.io.BufferedOutputStream.flush(BufferedOutputStream.java:142)
-               at 
java.base/java.io.FilterOutputStream.close(FilterOutputStream.java:182)
+               at...
                ... 44 more
        Caused by: java.io.IOException: Write end dead
                at 
java.base/java.io.PipedInputStream.read(PipedInputStream.java:310)
                at 
java.base/java.io.PipedInputStream.read(PipedInputStream.java:377)
                at 
com.google.cloud.hadoop.repackaged.gcs.com.google.api.client.util.ByteStreams.read(ByteStreams.java:172)
-               at 
com.google.cloud.hadoop.repackaged.gcs.com.google.api.client.googleapis.media.MediaHttpUploader.buildContentChunk(MediaHttpUploader.java:610)
-               at 
com.google.cloud.hadoop.repackaged.gcs.com.google.api.client.googleapis.media.MediaHttpUploader.resumableUpload(MediaHttpUploader.java:380)
-               at 
com.google.cloud.hadoop.repackaged.gcs.com.google.api.client.googleapis.media.MediaHttpUploader.upload(MediaHttpUploader.java:308)
-               at 
com.google.cloud.hadoop.repackaged.gcs.com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:539)
-               at 
com.google.cloud.hadoop.repackaged.gcs.com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:466)
-               at 
com.google.cloud.hadoop.repackaged.gcs.com.google.api.client.googleapis.services.AbstractGoogleClientRequest.execute(AbstractGoogleClientRequest.java:576)
-               at 
com.google.cloud.hadoop.repackaged.gcs.com.google.cloud.hadoop.util.AbstractGoogleAsyncWriteChannel$UploadOperation.call(AbstractGoogleAsyncWriteChannel.java:85)
-               at 
java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
+               at ...
                ... 3 more
 Caused by: [CIRCULAR REFERENCE: java.io.IOException: Write end dead]
 ```
 
-We have an active patch(https://github.com/apache/hudi/pull/7245) on fixing 
the issue. Until we land this, you can use above config to bypass the issue.  
+We have an active 
patch([https://github.com/apache/hudi/pull/7245](https://github.com/apache/hudi/pull/7245))
 on fixing the issue. Until we land this, you can use above config to bypass 
the issue.
 
-#### Hive Sync
+### Hive Sync
+#### SQLException: following columns have types incompatible
 
-##### Caused by: java.sql.SQLException: Error while processing statement: 
FAILED: Execution Error, return code 1 from 
org.apache.hadoop.hive.ql.exec.DDLTask. Unable to alter table. The following 
columns have types incompatible with the existing columns in their respective 
positions : __col1,__col2
+```plain
+Error while processing statement: FAILED: Execution Error, return code 1 from 
org.apache.hadoop.hive.ql.exec.DDLTask. Unable to alter table. The following 
columns have types incompatible with the existing columns in their respective 
positions : col1,col2
+```
 
-This will usually happen when you are trying to add a new column to existing 
hive table using our HiveSyncTool.java class. Databases usually will not allow 
to modify a column datatype from a higher order to lower order or cases where 
the datatypes may clash with the data that is already stored/will be stored in 
the table. To fix the same, try setting the following property -
+This will usually happen when you are trying to add a new column to existing 
hive table using our 
[HiveSyncTool.java](https://github.com/apache/hudi/blob/be4dfccbb24794dfac3714818971229870d24a2c/hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/HiveSyncTool.java)
 class. Databases usually will not allow to modify a column datatype from a 
higher order to lower order or cases where the datatypes may clash with the 
data that is already stored/will be stored in the table. To fix the [...]
 
-```scala
+```plain
 set hive.metastore.disallow.incompatible.col.type.changes=false;
 ```
 
-##### com.uber.hoodie.hive.HoodieHiveSyncException: Could not convert field 
Type from &lt;type1&gt; to &lt;type2&gt; for field col1
+#### HoodieHiveSyncException: Could not convert field Type from `<type1>` to 
`<type2>` for field col1
 
 This occurs because HiveSyncTool currently supports only few compatible data 
type conversions. Doing any other incompatible change will throw this 
exception. Please check the data type evolution for the concerned field and 
verify if it indeed can be considered as a valid data type conversion as per 
Hudi code base.
 
-##### Caused by: org.apache.hadoop.hive.ql.parse.SemanticException: Database 
does not exist: test_db
+#### org.apache.hadoop.hive.ql.parse.SemanticException: Database does not 
exist: test\_db
 
-This generally occurs if you are trying to do Hive sync for your Hudi dataset 
and the configured hive_sync database does not exist. Please create the 
corresponding database on your Hive cluster and try again.
+This generally occurs if you are trying to do Hive sync for your Hudi dataset 
and the configured hive\_sync database does not exist. Please create the 
corresponding database on your Hive cluster and try again.
 
-##### Caused by: org.apache.thrift.TApplicationException: Invalid method name: 
'get_table_req'
+#### org.apache.thrift.TApplicationException: Invalid method name: 
'get\_table\_req'
 
 This issue is caused by hive version conflicts, hudi built with hive-2.3.x 
version, so if still want hudi work with older hive version
 
-```scala
+```plain
 Steps: (build with hive-2.1.0)
 1. git clone [email protected]:apache/incubator-hudi.git
 2. rm 
hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/hive/HoodieCombineHiveInputFormat.java
 3. mvn clean package -DskipTests -DskipITs -Dhive.version=2.1.0
 ```
 
-##### Caused by : java.lang.UnsupportedOperationException: Table rename is not 
supported
+#### java.lang.UnsupportedOperationException: Table rename is not supported
+
+This issue could occur when syncing to hive. Possible reason is that, hive 
does not play well if your table name has upper and lower case letter. Try to 
have all lower case letters for your table name and it should likely get fixed. 
Related issue: 
[https://github.com/apache/hudi/issues/2409](https://github.com/apache/hudi/issues/2409)
+
+#### How can I resolve the IllegalArgumentException saying `Partitions must be 
in the same table` when attempting to sync to a metastore?
 
-This issue could occur when syncing to hive. Possible reason is that, hive 
does not play well if your table name has upper and lower case letter. Try to 
have all lower case letters for your table name and it should likely get fixed. 
Related issue: https://github.com/apache/hudi/issues/2409
+This will occur when capital letters are used in the table name. Metastores 
such as Hive automatically convert table names
 
-#### Running from IDE
+to lowercase. While we allow capitalization on Hudi tables, if you would like 
to use a metastore you may be required to
 
-##### "java.lang.IllegalArgumentException: Unsupported class file major 
version 56"
+use all lowercase letters. More details on how this issue presents can be 
found [here](https://github.com/apache/hudi/issues/6832).
 
-Please use java 8, and not java 11
+####
\ No newline at end of file
diff --git a/website/docs/tuning-guide.md b/website/docs/tuning-guide.md
index 4affeafda66..4eaddce2dbd 100644
--- a/website/docs/tuning-guide.md
+++ b/website/docs/tuning-guide.md
@@ -9,27 +9,77 @@ last_modified_at: 2021-09-29T15:59:57-04:00
 To get a better understanding of where your Hudi jobs is spending its time, 
use a tool like [YourKit Java Profiler](https://www.yourkit.com/download/), to 
obtain heap dumps/flame graphs.
 :::
 
+## Writing
+
+### General Tips
+
 Writing data via Hudi happens as a Spark job and thus general rules of spark 
debugging applies here too. Below is a list of things to keep in mind, if you 
are looking to improving performance or reliability.
 
-**Input Parallelism** : By default, Hudi tends to over-partition input (i.e 
`withParallelism(1500)`), to ensure each Spark partition stays within the 2GB 
limit for inputs upto 500GB. Bump this up accordingly if you have larger 
inputs. We recommend having shuffle parallelism 
`hoodie.[insert|upsert|bulkinsert].shuffle.parallelism` such that its atleast 
input_data_size/500MB
+**Input Parallelism** : By default, Hudi follows the input parallelism. Bump 
this up accordingly if you have larger inputs, that can cause more shuffles. We 
recommend tuning shuffle parallelism 
hoodie.[insert|upsert|bulkinsert].shuffle.parallelism such that its at least 
input_data_size/500MB.
+
+**Off-heap memory** : Hudi writes parquet files and that needs good amount of 
off-heap memory proportional to schema width. Consider setting something like 
spark.executor.memoryOverhead or spark.driver.memoryOverhead, if you are 
running into such failures.
+
+**Spark Memory** : Typically, hudi needs to be able to read a single file into 
memory to perform merges or compactions and thus the executor memory should be 
sufficient to accomodate this. In addition, Hudi caches the input to be able to 
intelligently place data and thus leaving some `spark.memory.storageFraction` 
will generally help boost performance.
+
+**Sizing files**: Set target file sizes judiciously, to balance ingest/write 
latency vs number of files & consequently metadata overhead associated with it.
+
+**Timeseries/Log data** : Default configs are tuned for database/nosql 
changelogs where individual record sizes are large. Another very popular class 
of data is timeseries/event/log data that tends to be more volumnious with lot 
more records per partition. In such cases consider tuning the bloom filter 
accuracy to achieve your target index look up time or use a bucketed index 
configuration. Also, consider making a key that is prefixed with time of the 
event, which will enable range pruni [...]
+
+### Spark failures {#spark-ui}
+
+Typical upsert() DAG looks like below. Note that Hudi client also caches 
intermediate RDDs to intelligently profile workload and size files and spark 
parallelism.
+Also Spark UI shows sortByKey twice due to the probe job also being shown, 
nonetheless its just a single sort.
+<figure>
+    <img className="docimage" 
src={require("/assets/images/hudi_upsert_dag.png").default} 
alt="hudi_upsert_dag.png"  />
+</figure>
+
+**At a high level, there are two steps**:
+
+*Index Lookup to identify files to be changed*
+
+- Job 1 : Triggers the input data read, converts to HoodieRecord object and 
then stops at obtaining a spread of input records to target partition paths
+- Job 2 : Load the set of file names which we need check against
+- Job 3  & 4 : Actual lookup after smart sizing of spark join parallelism, by 
joining RDDs in 1 & 2 above
+- Job 5 : Have a tagged RDD of recordKeys with locations
 
-**Off-heap memory** : Hudi writes parquet files and that needs good amount of 
off-heap memory proportional to schema width. Consider setting something like 
`spark.executor.memoryOverhead` or `spark.driver.memoryOverhead`, if you are 
running into such failures.
+*Performing the actual writing of data*
 
-**Spark Memory** : Typically, hudi needs to be able to read a single file into 
memory to perform merges or compactions and thus the executor memory should be 
sufficient to accomodate this. In addition, Hoodie caches the input to be able 
to intelligently place data and thus leaving some 
`spark.memory.storageFraction` will generally help boost performance.
+- Job 6 : Lazy join of incoming records against recordKey, location to provide 
a final set of HoodieRecord which now contain the information about which 
file/partitionpath they are found at (or null if insert). Then also profile the 
workload again to determine sizing of files
+- Job 7 : Actual writing of data (update + insert + insert turned to updates 
to maintain file size)
 
-**Sizing files**: Set `limitFileSize` above judiciously, to balance 
ingest/write latency vs number of files & consequently metadata overhead 
associated with it.
+Depending on the exception source (Hudi/Spark), the above knowledge of the DAG 
can be used to pinpoint the actual issue. The most often encountered failures 
result from YARN/DFS temporary failures.
+In the future, a more sophisticated debug/management UI would be added to the 
project, that can help automate some of this debugging.
 
-**Timeseries/Log data** : Default configs are tuned for database/nosql 
changelogs where individual record sizes are large. Another very popular class 
of data is timeseries/event/log data that tends to be more volumnious with lot 
more records per partition. In such cases consider tuning the bloom filter 
accuracy via `.bloomFilterFPP()/bloomFilterNumEntries()` to achieve your target 
index look up time. Also, consider making a key that is prefixed with time of 
the event, which will enable r [...]
+### Hudi consumes too much space in a temp folder while upsert
 
-**GC Tuning**: Please be sure to follow garbage collection tuning tips from 
Spark tuning guide to avoid OutOfMemory errors. [Must] Use G1/CMS Collector. 
Sample CMS Flags to add to spark.executor.extraJavaOptions:
+When upsert large input data, hudi spills part of input data to disk when 
reach the max memory for merge. if there is enough memory, please increase 
spark executor's memory and  `hoodie.memory.merge.fraction` option, for example
+`option("hoodie.memory.merge.fraction", "0.8")`
+
+### How to tune shuffle parallelism of Hudi jobs ?
+
+First, let's understand what the term parallelism means in the context of Hudi 
jobs. For any Hudi job using Spark, parallelism equals to the number of spark 
partitions that should be generated for a particular stage in the DAG. To 
understand more about spark partitions, read this 
[article](https://www.dezyre.com/article/how-data-partitioning-in-spark-helps-achieve-more-parallelism/297).
 In spark, each spark partition is mapped to a spark task that can be executed 
on an executor. Typicall [...]
+
+(Spark Application → N Spark Jobs → M Spark Stages → T Spark Tasks) on (E 
executors with C cores)
+
+A spark application can be given E number of executors to run the spark 
application on. Each executor might hold 1 or more spark cores. Every spark 
task will require atleast 1 core to execute, so imagine T number of tasks to be 
done in Z time depending on C cores. The higher C, Z is smaller.
+
+With this understanding, if you want your DAG stage to run faster, bring T as 
close or higher to C. Additionally, this parallelism finally controls the 
number of output files you write using a Hudi based job. Let's understand the 
different kinds of knobs available:
+
+[BulkInsertParallelism](https://hudi.apache.org/docs/configurations#hoodiebulkinsertshuffleparallelism)
 → This is used to control the parallelism with which output files will be 
created by a Hudi job. The higher this parallelism, the more number of tasks 
are created and hence the more number of output files will eventually be 
created. Even if you define 
[parquet-max-file-size](https://hudi.apache.org/docs/configurations#hoodieparquetmaxfilesize)
 to be of a high value, if you make paralle [...]
+
+[Upsert](https://hudi.apache.org/docs/configurations#hoodieupsertshuffleparallelism)
 / [Insert 
Parallelism](https://hudi.apache.org/docs/configurations#hoodieinsertshuffleparallelism)
 → This is used to control how fast the read process should be when reading 
data into the job. Find more details 
[here](https://hudi.apache.org/docs/configurations/).
+
+### GC Tuning
+
+Please be sure to follow garbage collection tuning tips from Spark tuning 
guide to avoid OutOfMemory errors. Must Use G1/CMS Collector. Sample CMS Flags 
to add to spark.executor.extraJavaOptions:
 
 ```java
 -XX:NewSize=1g -XX:SurvivorRatio=2 -XX:+UseCompressedOops 
-XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:CMSInitiatingOccupancyFraction=70 
-XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps 
-XX:+PrintGCApplicationStoppedTime -XX:+PrintGCApplicationConcurrentTime 
-XX:+PrintTenuringDistribution -XX:+HeapDumpOnOutOfMemoryError 
-XX:HeapDumpPath=/tmp/hoodie-heapdump.hprof
 ```
 
-**OutOfMemory Errors**: If it keeps OOMing still, reduce spark memory 
conservatively: `spark.memory.fraction=0.2, spark.memory.storageFraction=0.2` 
allowing it to spill rather than OOM. (reliably slow vs crashing intermittently)
+**OutOfMemory Errors**: If it keeps OOMing still, reduce spark memory 
conservatively: spark.memory.fraction=0.2, spark.memory.storageFraction=0.2 
allowing it to spill rather than OOM. (reliably slow vs crashing intermittently)
 
-Below is a full working production config
+Below is a full working production config used at Uber (HDFS/Yarn), for their 
ingest platform.
 
 ```scala
 spark.driver.extraClassPath /etc/hive/conf
@@ -42,15 +92,14 @@ spark.executor.id driver
 spark.executor.instances 300
 spark.executor.memory 6g
 spark.rdd.compress true
- 
+
 spark.kryoserializer.buffer.max 512m
 spark.serializer org.apache.spark.serializer.KryoSerializer
 spark.shuffle.service.enabled true
-spark.sql.hive.convertMetastoreParquet false
 spark.submit.deployMode cluster
 spark.task.cpus 1
 spark.task.maxFailures 4
- 
+
 spark.driver.memoryOverhead 1024
 spark.executor.memoryOverhead 3072
 spark.yarn.max.executor.failures 100
diff --git a/website/src/pages/tech-specs.md b/website/src/pages/tech-specs.md
index a065ac99b4b..3e6824eb5de 100644
--- a/website/src/pages/tech-specs.md
+++ b/website/src/pages/tech-specs.md
@@ -193,7 +193,7 @@ The log file name format is:
 - **Log File Version** - Current version of the log file format
 - **File Write Token** - Monotonically increasing token for every attempt to 
write the log file. This should help uniquely identifying the log file when 
there are failures and retries. Cleaner can clean-up partial log files if the 
write token is not the latest in the file slice.
 
-The Log file format structure is a Hudi native format. The actual content 
bytes are serialized into one of Apache Avro, Apache Parquet or Apache HFile 
file formats based on configuration and the other metadata in the block is 
serialized using the Java DataOutputStream (DOS) serializer.
+The Log file format structure is a Hudi native format. The actual content 
bytes are serialized into one of Apache Avro, Apache Parquet or Apache HFile 
file formats based on configuration and the other metadata in the block is 
serialized using primitive types and byte arrays. 
 
 Hudi Log format specification is as follows. 
 
@@ -206,11 +206,11 @@ Hudi Log format specification is as follows.
 | **version**            | 4        | Version of the Log file format, 
monotonically increasing to support backwards compatibility                     
                                                                                
                                                                             |
 | **type**               | 4        | Represents the type of the log block. Id 
of the type is serialized as an Integer.                                        
                                                                                
                                                                    |
 | **header length**      | 8        | Length of the header section to follow   
                                                                                
                                                                                
                                                                    |
-| **header**             | variable | Map of header metadata entries. The 
entries are encoded with key as a metadata Id and the value is the String 
representation of the metadata value.                                           
                                                                               |
+| **header**             | variable | Custom serialized map of header metadata 
entries. 4 bytes of map size that denotes number of entries, then for each 
entry 4 bytes of metadata type, followed by length/bytearray of variable length 
utf-8 string.                                                            |
 | **content length**     | 8        | Length of the actual content serialized  
                                                                                
                                                                                
                                                                    |
 | **content**            | variable | The content contains the serialized 
records in one of the supported file formats (Apache Avro, Apache Parquet or 
Apache HFile)                                                                   
                                                                            |
 | **footer length**      | 8        | Length of the footer section to follow   
                                                                                
                                                                                
                                                                    |
-| **footer**             | variable | Similar to Header. Map of footer 
metadata entries. The entries are encoded with key as a metadata Id and the 
value is the String representation of the metadata value.                       
                                                                                
|
+| **footer**             | variable | Similar to Header. Map of footer 
metadata entries.                                                               
                                                                                
                                                                            |
 | **total block length** | 8        | Total size of the block including the 
magic bytes. This is used to determine if a block is corrupt by comparing to 
the block size in the header. Each log block assumes that the block size will 
be last data written in a block. Any data if written after is just ignored. |
 
 Metadata key mapping from Integer to actual metadata is as follows

[hudi] branch asf-site updated: [HUDI-6552][DOCS] Restructure and reorganize FAQ (#9231)

Reply via email to