This is an automated email from the ASF dual-hosted git repository.

bhavanisudha pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/asf-site by this push:
     new 886effdb510 [DOCS] Add Record Index Metadata partition documentation 
and other schema details (#9705)
886effdb510 is described below

commit 886effdb510a8b08d8bc19af136263c03d6bd851
Author: Lokesh Jain <[email protected]>
AuthorDate: Thu Sep 21 06:41:12 2023 +0530

    [DOCS] Add Record Index Metadata partition documentation and other schema 
details (#9705)
    
    * [DOCS] Add Record Index Metadata partition documentationa and other 
schema details
    
    * Add table
    
    * Address review comments
    
    * Add formatting fixes
    
    ---------
    
    Co-authored-by: Bhavani Sudha Saktheeswaran 
<[email protected]>
---
 website/src/pages/tech-specs.md | 178 ++++++++++++++++++++++++++--------------
 1 file changed, 116 insertions(+), 62 deletions(-)

diff --git a/website/src/pages/tech-specs.md b/website/src/pages/tech-specs.md
index 56155088846..fb3b67e63ab 100644
--- a/website/src/pages/tech-specs.md
+++ b/website/src/pages/tech-specs.md
@@ -58,10 +58,10 @@ Broadly, there can be two types of data files
  1. **Base files** - Files that contain a set of records in columnar file 
formats like Apache Parquet/Orc or indexed formats like HFile format.
  2. **log files** - Log files contain inserts, updates, deletes issued against 
a base file, encoded as a series of blocks. More on this 
[below](#log-file-format).
 
-| Table Type          | Trade-off                                              
                                                                                
               |
-|---------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------|
-| Copy-on-Write (CoW) | Data is stored entirely in base files, optimized for 
read performance and ideal for slow changing datasets                           
                 |
-| Merge-on-read (MoR) | Data is stored in a combination of base and log files, 
optimized to [balance the write and read 
performance](##balancing-write-and-query-performance) and ideal for frequently 
changing datasets |
+| Table Type           | Trade-off                                             
                                                                                
                                                           |
+|:---------------------|:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| Copy-on-Write (CoW)  | Data is stored entirely in base files, optimized for 
read performance and ideal for slow changing datasets                           
                                                            |
+| Merge-on-read (MoR)  | Data is stored in a combination of base and log 
files, optimized to [balance the write and read 
performance](##balancing-write-and-query-performance) and ideal for frequently 
changing datasets |
 
 ### Data Model
 Hudi's data model is designed like an update-able database like a key-value 
store. Within each partition, data is organized into key-value model, where 
every record is uniquely identified with a record key. 
@@ -69,24 +69,24 @@ Hudi's data model is designed like an update-able database 
like a key-value stor
 #### User fields
 To write a record into a Hudi table, each record must specify the following 
user fields.
 
-| User fields                 | Description                                    
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
              [...]
-| --------------------------- 
|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 [...]
-| Partitioning key [Optional] | Value of this field defines the directory 
hierarchy within the table base path. This essentially provides an hierarchy 
isolation for managing data and related metadata                                
                                                                                
                                                                                
                                                                                
                      [...]
-| Record key(s)               | Record keys uniquely identify a record within 
each partition if partitioning is enabled                                       
                                                                                
                                                                                
                                                                                
                                                                                
               [...]
-| Ordering field(s)           | Hudi guarantees the uniqueness constraint of 
record key and the conflict resolution configuration manages strategies on how 
to disambiguate when multiple records with the same keys are to be merged into 
the table. The resolution logic can be based on an ordering field or can be 
custom, specific to the table. To ensure consistent behaviour dealing with 
duplicate records, the resolution logic should be commutative, associative and 
idempotent. This is also re [...]
+| User fields                  | Description                                   
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
              [...]
+|:-----------------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 [...]
+| Partitioning key [Optional]  | Value of this field defines the directory 
hierarchy within the table base path. This essentially provides an hierarchy 
isolation for managing data and related metadata                                
                                                                                
                                                                                
                                                                                
                     [...]
+| Record key(s)                | Record keys uniquely identify a record within 
each partition if partitioning is enabled                                       
                                                                                
                                                                                
                                                                                
                                                                                
              [...]
+| Ordering field(s)            | Hudi guarantees the uniqueness constraint of 
record key and the conflict resolution configuration manages strategies on how 
to disambiguate when multiple records with the same keys are to be merged into 
the table. The resolution logic can be based on an ordering field or can be 
custom, specific to the table. To ensure consistent behaviour dealing with 
duplicate records, the resolution logic should be commutative, associative and 
idempotent. This is also r [...]
 
 #### Meta fields
 
 In addition to the fields specified by the table's schema, the following meta 
fields are added to each record, to unlock incremental processing and ease of 
debugging. These meta fields are part of the table schema and 
 stored with the actual record to avoid re-computation. 
 
-| Hudi meta-fields        | Description                                        
                                                                                
                                                                                
        |
-|-------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
-| \_hoodie\_commit\_time  | This field contains the commit timestamp in the 
[timeline](#transaction-log-timeline) that created this record. This enables 
granular, record-level history tracking on the table, much like database 
change-data-capture. |
-| \_hoodie\_commit\_seqno | This field contains a unique sequence number for 
each record within each transaction. This serves much like offsets in Apache 
Kafka topics, to enable generating streams out of tables.                       
             |
-| \_hoodie\_record\_key   | Unique record key identifying the record within 
the partition. Key is materialized to avoid changes to key field(s) resulting 
in violating unique constraints maintained within a table.                      
             |
-| \_hoodie\_partition\_path | Partition path under which the record is 
organized into.                                                                 
                                                                                
                  |
-| \_hoodie\_file\_name    | The data file name this record belongs to.         
                                                                                
                                                                                
        |
+| Hudi meta-fields          | Description                                      
                                                                                
                                                                                
           |
+|:--------------------------|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| \_hoodie\_commit\_time    | This field contains the commit timestamp in the 
[timeline](#transaction-log-timeline) that created this record. This enables 
granular, record-level history tracking on the table, much like database 
change-data-capture.  |
+| \_hoodie\_commit\_seqno   | This field contains a unique sequence number for 
each record within each transaction. This serves much like offsets in Apache 
Kafka topics, to enable generating streams out of tables.                       
              |
+| \_hoodie\_record\_key     | Unique record key identifying the record within 
the partition. Key is materialized to avoid changes to key field(s) resulting 
in violating unique constraints maintained within a table.                      
              |
+| \_hoodie\_partition\_path | Partition path under which the record is 
organized into.                                                                 
                                                                                
                   |
+| \_hoodie\_file\_name      | The data file name this record belongs to.       
                                                                                
                                                                                
           |
 
 Within a given file, all records share the same values for 
`_hoodie_partition_path` and `_hoodie_file_name`, thus easily compressed away 
without any overheads with columnar file formats. The other fields can also be 
optional for writers
 depending on whether protection against key field changes or incremental 
processing is desired. More on how to populate these fields in the sections 
below.
@@ -111,17 +111,17 @@ Monotonically increasing value to denote strict ordering 
of actions in the timel
  **Action type:**
 Type of action. The following are the actions on the Hudi timeline.
 
-| Action type   | Description                                                  
                                                                                
                                                                                
                                   |
-| ------------- 
|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
-| commit        | Commit denotes an **atomic write (inserts, updates and 
deletes)** of records in a table. A commit in Hudi is an atomic way of updating 
data, metadata and indexes. The guarantee is that all or none the changes 
within a commit will be visible to the readers |
-| deltacommit   | Special version of `commit` which is applicable only on a 
Merge-on-Read storage engine. The writes are accumulated and batched to improve 
write performance                                                               
                                      |
-| rollback      | Rollback denotes that the changes made by the corresponding 
commit/delta commit were unsuccessful & hence rolled back, removing any partial 
files produced during such a write                                              
                                    |
-| savepoint     | Savepoint is a special marker to ensure a particular commit 
is not automatically cleaned. It helps restore the table to a point on the 
timeline, in case of disaster/data recovery scenarios                           
                                         |
-| restore       | Restore denotes that the table was restored to a particular 
savepoint.                                                                      
                                                                                
                                    |
-| clean         | Management activity that cleans up versions of data files 
that no longer will be accessed                                                 
                                                                                
                                      |
-| compaction    | Management activity to optimize the storage for query 
performance. This action applies the batched up updates from `deltacommit` and 
re-optimizes data files for query performance                                   
                                           |
-| replacecommit | Management activity to replace a set of data files 
atomically with another. It can be used to cluster the data for better query 
performance. This action is different from a `commit` in that the table state 
before and after are logically equivalent         |
-| indexing      | Management activity to update the index with the data. This 
action does not change data, only updates the index aynchronously to data 
changes                                                                         
                                          |
+| Action type    | Description                                                 
                                                                                
                                                                                
                                     |
+|:---------------|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| commit         | Commit denotes an **atomic write (inserts, updates and 
deletes)** of records in a table. A commit in Hudi is an atomic way of updating 
data, metadata and indexes. The guarantee is that all or none the changes 
within a commit will be visible to the readers  |
+| deltacommit    | Special version of `commit` which is applicable only on a 
Merge-on-Read storage engine. The writes are accumulated and batched to improve 
write performance                                                               
                                       |
+| rollback       | Rollback denotes that the changes made by the corresponding 
commit/delta commit were unsuccessful & hence rolled back, removing any partial 
files produced during such a write                                              
                                     |
+| savepoint      | Savepoint is a special marker to ensure a particular commit 
is not automatically cleaned. It helps restore the table to a point on the 
timeline, in case of disaster/data recovery scenarios                           
                                          |
+| restore        | Restore denotes that the table was restored to a particular 
savepoint.                                                                      
                                                                                
                                     |
+| clean          | Management activity that cleans up versions of data files 
that no longer will be accessed                                                 
                                                                                
                                       |
+| compaction     | Management activity to optimize the storage for query 
performance. This action applies the batched up updates from `deltacommit` and 
re-optimizes data files for query performance                                   
                                            |
+| replacecommit  | Management activity to replace a set of data files 
atomically with another. It can be used to cluster the data for better query 
performance. This action is different from a `commit` in that the table state 
before and after are logically equivalent          |
+| indexing       | Management activity to update the index with the data. This 
action does not change data, only updates the index aynchronously to data 
changes                                                                         
                                           |
 
 **Action state:**
 Denotes the state transition identifier (requested -\> inflight -\> completed)
@@ -152,13 +152,65 @@ By reconciling all the actions in the timeline, the state 
of the Hudi table can
 
 ## Metadata
 
-Hudi automatically extracts the physical data statistics and stores the 
metadata along with the data to improve write and query performance. Hudi 
Metadata is an internally-managed table which organizes the table metadata 
under the base path *.hoodie/metadata.* The metadata is in itself a Hudi table, 
organized with the Hudi merge-on-read storage format. Every record stored in 
the metadata table is a Hudi record and hence has partitioning key and record 
key specified. Following are the met [...]
+Hudi automatically extracts the physical data statistics and stores the 
metadata along with the data to improve write and query performance. Hudi 
Metadata is an internally-managed table which organizes the table metadata 
under the base path *.hoodie/metadata.* The metadata is in itself a Hudi table, 
organized with the Hudi merge-on-read storage format. Every record stored in 
the metadata table is a Hudi record and hence has partitioning key and record 
key specified.
+
+Apache Hudi platform employs HFile format, to store metadata and indexes, to 
ensure high performance, though 
+different implementations are free to choose their own. Following are the 
metadata table partitions :-
+
+- **files** - Partition path to file name index. Key for the Hudi record is 
the partition path and the 
+actual record is a map of file name to an instance of 
[HoodieMetadataFileInfo][15] (Refer the schema below). 
+The files index can be used to do file listing and do filter based pruning of 
the scanset during query.
+
+| Schema                  | Field Name   | Data Type  | Description            
           |
+|:------------------------|:-------------|:-----------|:----------------------------------|
+| HoodieMetadataFileInfo  | `size`       | long       | size of the file       
           |
+|                         | `isDeleted`  | boolean    | whether file has been 
deleted     |
+
+- **bloom\_filters** - Bloom filter index to help map a record key to the 
actual file. The Hudi key is 
+`str_concat(hash(partition name), hash(file name))` and the actual payload is 
an instance of 
+[HudiMetadataBloomFilter][16] (Refer the schema below). Bloom filter is used 
to accelerate 
+'presence checks' validating whether particular record is present in the file, 
which is used during merging, 
+hash-based joins, point-lookup queries, etc.
+
+| Schema                    | Field Name     | Data Type  | Description        
                                  |
+|:--------------------------|:---------------|:-----------|:-----------------------------------------------------|
+| HudiMetadataBloomFilter   | `size`         | long       | size of the file   
                                  |
+|                           | `type`         | string     | type code of the 
bloom filter                        |
+|                           | `timestamp`    | string     | timestamp when the 
bloom filter was created/updated  |
+|                           | `bloomFilter`  | bytes      | the actual bloom 
filter for the data file            |
+|                           | `isDeleted`    | boolean    | whether the bloom 
filter entry is valid              |
+
+- **column\_stats** - contains statistics of columns for all the records in 
the table. This enables fine 
+grained file pruning for filters and join conditions in the query. The actual 
payload is an instance of 
+[HoodieMetadataColumnStats][17] (Refer the schema below).
+
+| Schema                      | Field Name               | Data Type           
                       | Description                                   |
+|:----------------------------|:-------------------------|:-------------------------------------------|:----------------------------------------------|
+| HoodieMetadataColumnStats   | `fileName`               | string              
                       | file name for which the column stat applies   |
+|                             | `columnName`             | string              
                       | column name for which the column stat apples  |
+|                             | `minValue`               | [Wrapper type][19] 
(based on data schema)  | minimum value of the column in the file       |
+|                             | `maxValue`               | [Wrapper type][19] 
(based on data schema)  | maximum value of the column in the file       |
+|                             | `valueCount`             | long                
                       | total count of values                         |
+|                             | `nullCount`              | long                
                       | total count of null values                    |
+|                             | `totalSize`              | long                
                       | total storage size on disk                    |
+|                             | `totalUncompressedSize`  | long                
                       | total uncompressed storage size on disk       |
+|                             | `isDeleted`              | boolean             
                       | whether the column stat entry is valid        |
+
+- **record\_index** - contains information about record keys and their 
location in the dataset. This improves 
+performance of updates since it provides file locations for the updated 
records and also enables fine grained 
+file pruning for filters and join conditions in the query. The payload is an 
instance of 
+[HoodieRecordIndexInfo][18] (Refer the schema below).
+
+| Schema                  | Field Name           | Data Type  | Description    
                                                                                
                                                                                
                                               |
+|:------------------------|:---------------------|:-----------|:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| HoodieRecordIndexInfo   | `partitionName`      | string     | partition name 
to which the record belongs                                                     
                                                                                
                                               |
+|                         | `fileIdEncoding`     | int        | determines the 
fields used to deduce file id. When the encoding is 0, file Id can be deduced 
from fileIdLowBits, fileIdHighBits and fileIndex. When encoding is 1, file Id 
is available in raw string format in fileId field  |
+|                         | `fileId`             | string     | file id in raw 
string format is available when encoding is set to 1                            
                                                                                
                                               |
+|                         | `fileIdHighBits`     | long       | file Id can be 
deduced as {UUID}-{fileIndex} when encoding is set to 0. fileIdHighBits and 
fileIdLowBits form the UUID                                                     
                                                   |
+|                         | `fileIdLowBits`      | long       | file Id can be 
deduced as {UUID}-{fileIndex} when encoding is set to 0. fileIdHighBits and 
fileIdLowBits form the UUID                                                     
                                                   |
+|                         | `fileIndex`          | int        | file Id can be 
deduced as {UUID}-{fileIndex} when encoding is set to 0. fileIdHighBits and 
fileIdLowBits form the UUID                                                     
                                                   |
+|                         | `instantTime`        | long       | Epoch time in 
millisecond representing the commit time at which record was added              
                                                                                
                                                |
 
-- **files** - Partition path to file name index. Key for the Hudi record is 
the partition path and the actual record is a map of file name to an instance 
of [HoodieMetadataFileInfo][15]. The files index can be used to do file listing 
and do filter based pruning of the scanset during query
-- **bloom\_filters** - Bloom filter index to help map a record key to the 
actual file. The Hudi key is `str_concat(hash(partition name), hash(file 
name))` and the actual payload is an instance of [HudiMetadataBloomFilter][16]. 
Bloom filter is used to accelerate 'presence checks' validating whether 
particular record is present in the file, which is used during merging, 
hash-based joins, point-lookup queries, etc.
-- **column\_stats** - contains statistics of columns for all the records in 
the table. This enables fine grained file pruning for filters and join 
conditions in the query. The actual payload is an instance of 
[HoodieMetadataColumnStats][17]. 
-
-Apache Hudi platform employs HFile format, to store metadata and indexes, to 
ensure high performance, though different implementations are free to choose 
their own. 
 
 ## File Layout Hierarchy
 
@@ -199,19 +251,19 @@ Hudi Log format specification is as follows.
 
 ![hudi\_log\_format\_v2][image-1]
 
-| Section                | \#Bytes  | Description                              
                                                                                
                                                                                
                                                                    |
-|------------------------| -------- | 
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 |
-| **magic**              | 6        | 6 Characters '#HUDI#' stored as a byte 
array. Sanity check for block corruption to assert start 6 bytes matches the 
magic byte[].                                                                   
                                                                         |
-| **LogBlock length**    | 8        | Length of the block excluding the magic. 
                                                                                
                                                                                
                                                                    |
-| **version**            | 4        | Version of the Log file format, 
monotonically increasing to support backwards compatibility                     
                                                                                
                                                                             |
-| **type**               | 4        | Represents the type of the log block. Id 
of the type is serialized as an Integer.                                        
                                                                                
                                                                    |
-| **header length**      | 8        | Length of the header section to follow   
                                                                                
                                                                                
                                                                    |
-| **header**             | variable | Custom serialized map of header metadata 
entries. 4 bytes of map size that denotes number of entries, then for each 
entry 4 bytes of metadata type, followed by length/bytearray of variable length 
utf-8 string.                                                            |
-| **content length**     | 8        | Length of the actual content serialized  
                                                                                
                                                                                
                                                                    |
-| **content**            | variable | The content contains the serialized 
records in one of the supported file formats (Apache Avro, Apache Parquet or 
Apache HFile)                                                                   
                                                                            |
-| **footer length**      | 8        | Length of the footer section to follow   
                                                                                
                                                                                
                                                                    |
-| **footer**             | variable | Similar to Header. Map of footer 
metadata entries.                                                               
                                                                                
                                                                            |
-| **total block length** | 8        | Total size of the block including the 
magic bytes. This is used to determine if a block is corrupt by comparing to 
the block size in the header. Each log block assumes that the block size will 
be last data written in a block. Any data if written after is just ignored. |
+| Section                 | \#Bytes   | Description                            
                                                                                
                                                                                
                                                                       |
+|:------------------------|:----------|:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| **magic**               | 6         | 6 Characters '#HUDI#' stored as a byte 
array. Sanity check for block corruption to assert start 6 bytes matches the 
magic byte[].                                                                   
                                                                          |
+| **LogBlock length**     | 8         | Length of the block excluding the 
magic.                                                                          
                                                                                
                                                                            |
+| **version**             | 4         | Version of the Log file format, 
monotonically increasing to support backwards compatibility                     
                                                                                
                                                                              |
+| **type**                | 4         | Represents the type of the log block. 
Id of the type is serialized as an Integer.                                     
                                                                                
                                                                        |
+| **header length**       | 8         | Length of the header section to follow 
                                                                                
                                                                                
                                                                       |
+| **header**              | variable  | Custom serialized map of header 
metadata entries. 4 bytes of map size that denotes number of entries, then for 
each entry 4 bytes of metadata type, followed by length/bytearray of variable 
length utf-8 string.                                                            
 |
+| **content length**      | 8         | Length of the actual content 
serialized                                                                      
                                                                                
                                                                                
 |
+| **content**             | variable  | The content contains the serialized 
records in one of the supported file formats (Apache Avro, Apache Parquet or 
Apache HFile)                                                                   
                                                                             |
+| **footer length**       | 8         | Length of the footer section to follow 
                                                                                
                                                                                
                                                                       |
+| **footer**              | variable  | Similar to Header. Map of footer 
metadata entries.                                                               
                                                                                
                                                                             |
+| **total block length**  | 8         | Total size of the block including the 
magic bytes. This is used to determine if a block is corrupt by comparing to 
the block size in the header. Each log block assumes that the block size will 
be last data written in a block. Any data if written after is just ignored.  |
 
 Metadata key mapping from Integer to actual metadata is as follows
 
@@ -233,11 +285,11 @@ Encodes a command to the log reader. The Command block 
must be 0 byte content bl
 
 ![spec\_log\_format\_delete\_block][image-2]
 
-| Section        | \#bytes  | Description                                      
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
    |
-| -------------- | -------- | 
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 |
-| format version | 4        | version of the log file format                   
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
    |
-| length         | 8        | length of the deleted keys section to follow     
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
    |
-| deleted keys   | variable | Tombstone of the record to encode a delete.  The 
following 3 fields are serialized using the KryoSerializer.  **Record Key** - 
Unique record key within the partition to deleted **Partition Path** - 
Partition path of the record deleted **Ordering Value** - In a particular batch 
of updates, the delete block is always written after the data 
(Avro/HFile/Parquet) block. This field would preserve the ordering of deletes 
and inserts within the same batch. |
+| Section         | \#bytes   | Description                                    
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
       |
+|:----------------|:----------|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| format version  | 4         | version of the log file format                 
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
       |
+| length          | 8         | length of the deleted keys section to follow   
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
       |
+| deleted keys    | variable  | Tombstone of the record to encode a delete.  
The following 3 fields are serialized using the KryoSerializer.  **Record Key** 
- Unique record key within the partition to deleted **Partition Path** - 
Partition path of the record deleted **Ordering Value** - In a particular batch 
of updates, the delete block is always written after the data 
(Avro/HFile/Parquet) block. This field would preserve the ordering of deletes 
and inserts within the same batch.  |
 
 ##### Corrupted Block (Id: 3)
 
@@ -249,12 +301,12 @@ Data block serializes the actual records written into the 
log file
 
 ![spec\_log\_format\_avro\_block][image-3]
 
-| Section        | \#bytes  | Description                                      
                   |
-| -------------- | -------- | 
------------------------------------------------------------------- |
-| format version | 4        | version of the log file format                   
                   |
-| record count   | 4        | total number of records in this block            
                   |
-| record length  | 8        | length of the record content to follow           
                   |
-| record content | variable | Record represented as an Avro record serialized 
using BinaryEncoder |
+| Section         | \#bytes   | Description                                    
                      |
+|:----------------|:----------|:---------------------------------------------------------------------|
+| format version  | 4         | version of the log file format                 
                      |
+| record count    | 4         | total number of records in this block          
                      |
+| record length   | 8         | length of the record content to follow         
                      |
+| record content  | variable  | Record represented as an Avro record 
serialized using BinaryEncoder  |
 
 ##### HFile Block (Id: 5)
 
@@ -307,10 +359,10 @@ A critical design choice for any table is to pick the 
right trade-offs in the da
 
 #### Table types
 
-|                     | Merge Efficiency                                       
                                                                                
                                                                                
                                                                                
           | Query Efficiency                                                   
                                                                                
              [...]
-| ------------------- 
|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------
 [...]
-| Copy on Write (COW) | **Tunable** <br />COW table type creates a new File 
slice in the file-group for every batch of updates. Write amplification can be 
quite high when the update is spread across multiple file groups. The cost 
involved can be high over a time period especially on tables with low data 
latency requirements.    | **Optimal** <br />COW table types create whole 
readable data files in open source columnar file formats on each merge batch, 
there is minimal overhead per recor [...]
-| Merge on Read (MOR) | **Optimal** <br />MOR table type batches the updates 
to the file slice in a separate optimized Log file, write amplification is 
amortized over time when sufficient updates are batched. The merge cost 
involved will be lower than COW since the churn on the records re-written for 
every update is much lower. | **Tunable**<br />MOR Table type required record 
level merging during query. Although there are techniques to make this merge as 
efficient as possible, there is  [...]
+|                     | Merge Efficiency                                       
                                                                                
                                                                                
                                                                                
            | Query Efficiency                                                  
                                                                                
              [...]
+|:--------------------|:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------
 [...]
+| Copy on Write (COW) | **Tunable** <br />COW table type creates a new File 
slice in the file-group for every batch of updates. Write amplification can be 
quite high when the update is spread across multiple file groups. The cost 
involved can be high over a time period especially on tables with low data 
latency requirements.     | **Optimal** <br />COW table types create whole 
readable data files in open source columnar file formats on each merge batch, 
there is minimal overhead per reco [...]
+| Merge on Read (MOR) | **Optimal** <br />MOR table type batches the updates 
to the file slice in a separate optimized Log file, write amplification is 
amortized over time when sufficient updates are batched. The merge cost 
involved will be lower than COW since the churn on the records re-written for 
every update is much lower.  | **Tunable**<br />MOR Table type required record 
level merging during query. Although there are techniques to make this merge as 
efficient as possible, there is [...]
 
 > Interesting observation on the MOR table format is that, by providing a 
 > special view of the table which only serves the base files in the file slice 
 > (read optimized query of MOR table), query can pick between query efficiency 
 > and data freshness dynamically during query time. Compaction frequency 
 > determines the data freshness of the read optimized view. With this, the MOR 
 > has all the levers required to balance the merge and query performance 
 > dynamically. 
 
@@ -427,7 +479,9 @@ The efficiency of Optimistic concurrency is inversely 
proportional to the possib
 [15]:  
https://github.com/apache/hudi/blob/master/hudi-common/src/main/avro/HoodieMetadata.avsc#L34
 [16]:  
https://github.com/apache/hudi/blob/master/hudi-common/src/main/avro/HoodieMetadata.avsc#L66
 [17]:  
https://github.com/apache/hudi/blob/master/hudi-common/src/main/avro/HoodieMetadata.avsc#L101
+[18]:  
https://github.com/apache/hudi/blob/master/hudi-common/src/main/avro/HoodieMetadata.avsc#L369
+[19]:   
https://github.com/apache/hudi/blob/master/hudi-common/src/main/avro/HoodieMetadata.avsc#L125
 
 [image-1]:     /assets/images/hudi_log_format_v2.png
 [image-2]:     /assets/images/spec/spec_log_format_delete_block.png
-[image-3]:     /assets/images/spec/spec_log_format_avro_block.png
\ No newline at end of file
+[image-3]:     /assets/images/spec/spec_log_format_avro_block.png


Reply via email to