vinothchandar commented on code in PR #6408:
URL: https://github.com/apache/hudi/pull/6408#discussion_r946760002
##########
website/src/pages/tech-specs.md:
##########
@@ -263,68 +274,68 @@ Readers will use snapshot isolation to query a Hudi
dataset at a consistent poin
## Writer Expectations
-Writer into Hudi will have to ingest new records, updates to existing records
or delete records into the dataset. All transactional actions follow the same
state transition as described in the transaction log (timeline) section.
Writers will optimistically create new base and log files and will finally
transition the action state to completed to register all the modifications to
the dataset atomically. Writer merges the data using the following steps
+Writer into Hudi will have to ingest new records, updates to existing records
or delete records into the table. All transactional actions follow the same
state transition as described in the transaction log (timeline) section.
Writers will optimistically create new base and log files and will finally
transition the action state to completed to register all the modifications to
the table atomically. Writer merges the data using the following steps
1. Writer will pick a monotonically increasing instant time from the latest
state of the Hudi timeline (**action commit time**) and will pick the last
successful commit instant (**merge commit time**) to merge the changes to. If
the merge succeeds, then action commit time will be the next successful commit
in the timeline.
-2. For all the incoming records, the writer will have to efficiently determine
if this is an update or insert. This is done by a process called tagging -
which is a batched point lookups of the record key and partition path pairs in
the entire dataset. The efficiency of tagging is critical to the merge
performance. This can be optimized with indexes (bloom, global key value based
index) and caching. New records will not have a tag.
+2. For all the incoming records, the writer will have to efficiently determine
if this is an update or insert. This is done by a process called tagging -
which is a batched point lookups of the record key and partition path pairs in
the entire table. The efficiency of tagging is critical to the merge
performance. This can be optimized with indexes (bloom, global key value based
index) and caching. New records will not have a tag.
3. Once records are tagged, the writer can apply them onto the specific file
slice.
- 1. For copy on write, writer will create a new slice (action commit time)
of the base file in the file group
- 2. For merge on read, writer will create a new log file with the action
commit time on the merge commit time file slice
+ 1. For CoW, writer will create a new slice (action commit time) of the base
file in the file group
+ 2. For MoR, writer will create a new log file with the action commit time
on the merge commit time file slice
4. Deletes are encoded as special form of updates where only the meta fields
and the operation is populated. See the delete block type in log format block
types.
-5. Once all the writes into the file system is complete, concurrency control
checks happen to ensure there are no overlapping writes and if that succeeds,
the commit action is completed in the timeline atomically making the changes
merged visible for the next reader.
+5. Once all the writes into the file system are complete, concurrency control
checks happen to ensure there are no overlapping writes and if that succeeds,
the commit action is completed in the timeline atomically making the changes
merged visible for the next reader.
6. Synchronizing Indexes and metadata needs to be done in the same transaction
that commits the modifications to the table.
-## Balancing data freshness and query performance
+## Balancing write and query performance
-Critical design choice for any dataset is to pick the right trade-offs in the
data freshness and query performance spectrum. Hudi storage format lets the
users decide on this trade-off by picking the table type, record merging and
file sizing.
+A critical design choice for any table is to pick the right trade-offs in the
data freshness and query performance spectrum. Hudi storage format lets the
users decide on this trade-off by picking the table type, record merging and
file sizing.
#### Table types
-| | Merge Efficiency
| Query Efficiency
|
-| ------------------- |
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|
-| Copy on Write (COW) | **Inefficient** <br />COW table type creates a new
File slice in the file group for every batch of updates. Write amplification
can be quite high when the update is spread across multiple file groups. The
cost involved can be high over a time period especially on datasets with low
data latency requirements. | **Efficient** <br />COW table types create whole
readable data files in open source columnar file formats on each merge batch,
there is minimum overhead per record in the query engine. Query engines are
fairly optimized for accessing files directly in cloud storage.
|
-| Merge on Read (MOR) | **Efficient** <br />MOR table type batches the updates
to the file slice in a separate optimized Log file, write amplification is
amortized over time when sufficient updates are batched. The merge cost
involved will be lower than COW since the churn on the records re-written for
every update is much lower. | **Inefficient**<br />MOR Table type required
record level merging during query. Although there are techniques to make this
merge as efficient as possible, there is still a record level overhead to apply
the updates batched up for the file slice. The merge cost applies on every
query until the compaction applies the updates and creates a new file slice. |
+| | Merge Efficiency
| Query Efficiency
|
+| -------------------
|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| Copy on Write (COW) | **Tunable** <br />COW table type creates a new File
slice in the file-group for every batch of updates. Write amplification can be
quite high when the update is spread across multiple file groups. The cost
involved can be high over a time period especially on tables with low data
latency requirements. | **Optimal** <br />COW table types create whole
readable data files in open source columnar file formats on each merge batch,
there is minimal overhead per record in the query engine. Query engines are
fairly optimized for accessing files directly in cloud storage.
|
Review Comment:
@prasannarajaperumal I made this `tunable` vs `optimal` . CoW is optimal for
reads for e.g , while you can tune merge, by over-provisioning writers. this is
probably a better way to talk about this?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]