This is an automated email from the ASF dual-hosted git repository.

emkornfield pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/parquet-format.git


The following commit(s) were added to refs/heads/master by this push:
     new 786142e  Improve wording about error cases in VariantShredding (#523)
786142e is described below

commit 786142e26740487930ddc3ec5e39d780bd930907
Author: Dmytro Tsyliuryk <[email protected]>
AuthorDate: Mon Oct 20 22:39:23 2025 +0200

    Improve wording about error cases in VariantShredding (#523)
---
 VariantShredding.md | 33 +++++++++++++++++----------------
 1 file changed, 17 insertions(+), 16 deletions(-)

diff --git a/VariantShredding.md b/VariantShredding.md
index 4a98a31..cfc05fd 100644
--- a/VariantShredding.md
+++ b/VariantShredding.md
@@ -22,7 +22,7 @@
 The Variant type is designed to store and process semi-structured data 
efficiently, even with heterogeneous values.
 Query engines encode each Variant value in a self-describing format, and store 
it as a group containing `value` and `metadata` binary fields in Parquet.
 Since data is often partially homogeneous, it can be beneficial to extract 
certain fields into separate Parquet columns to further improve performance.
-This process is **shredding**.
+This process is called **shredding**.
 
 Shredding enables the use of Parquet's columnar representation for more 
compact data encoding, column statistics for data skipping, and partial 
projections.
 
@@ -202,21 +202,22 @@ As a result, reads when both `value` and `typed_value` 
are defined may be incons
 
 The table below shows how the series of objects in the first column would be 
stored:
 
-| Event object                                                                 
      | `value`                           | `typed_value` | 
`typed_value.event_type.value` | `typed_value.event_type.typed_value` | 
`typed_value.event_ts.value` | `typed_value.event_ts.typed_value` | Notes       
                                     |
-|------------------------------------------------------------------------------------|-----------------------------------|---------------|--------------------------------|--------------------------------------|------------------------------|------------------------------------|--------------------------------------------------|
-| `{"event_type": "noop", "event_ts": 1729794114937}`                          
      | null                              | non-null      | null                
           | `noop`                               | null                        
 | 1729794114937                      | Fully shredded object                   
         |
-| `{"event_type": "login", "event_ts": 1729794146402, "email": 
"[email protected]"}`  | `{"email": "[email protected]"}`   | non-null      | 
null                           | `login`                              | null    
                     | 1729794146402                      | Partially shredded 
object                        |
-| `{"error_msg": "malformed: ..."}`                                            
      | `{"error_msg", "malformed: ..."}` | non-null      | null                
           | null                                 | null                        
 | null                               | Object with all shredded fields missing 
         |
-| `"malformed: not an object"`                                                 
      | `malformed: not an object`        | null          |                     
           |                                      |                             
 |                                    | Not an object (stored as Variant 
string)         |
-| `{"event_ts": 1729794240241, "click": "_button"}`                            
      | `{"click": "_button"}`            | non-null      | null                
           | null                                 | null                        
 | 1729794240241                      | Field `event_type` is missing           
         |
-| `{"event_type": null, "event_ts": 1729794954163}`                            
      | null                              | non-null      | `00` (field exists, 
is null)   | null                                 | null                        
 | 1729794954163                      | Field `event_type` is present and is 
null        |
-| `{"event_type": "noop", "event_ts": "2024-10-24"}`                           
      | null                              | non-null      | null                
           | `noop`                               | `"2024-10-24"`              
 | null                               | Field `event_ts` is present but not a 
timestamp  |
-| `{ }`                                                                        
      | null                              | non-null      | null                
           | null                                 | null                        
 | null                               | Object is present but empty             
         |
-| null                                                                         
      | `00` (null)                       | null          |                     
           |                                      |                             
 |                                    | Object/value is null                    
         |
-| missing                                                                      
      | null                              | null          |                     
           |                                      |                             
 |                                    | Object/value is missing                 
         |
-| INVALID                                                                      
      | `{"event_type": "login"}`         | non-null      | null                
           | `login`                              | null                        
 | 1729795057774                      | INVALID: Shredded field is present in 
`value`    |
-| INVALID                                                                      
      | `"a"`                             | non-null      | null                
           | null                                 | null                        
 | null                               | INVALID: `typed_value` is present for 
non-object |
-| INVALID                                                                      
      | `02 00` (object with 0 fields)    | null          |                     
           |                                      |                             
 |                                    | INVALID: `typed_value` is null for 
object        |
+| Event object                                                                 
     | `value`                           | `typed_value` | 
`typed_value.event_type.value` | `typed_value.event_type.typed_value` | 
`typed_value.event_ts.value` | `typed_value.event_ts.typed_value` | Notes       
                                                               |
+|-----------------------------------------------------------------------------------|-----------------------------------|---------------|--------------------------------|--------------------------------------|------------------------------|------------------------------------|----------------------------------------------------------------------------|
+| `{"event_type": "noop", "event_ts": 1729794114937}`                          
     | null                              | non-null      | null                 
          | `noop`                               | null                         
| 1729794114937                      | Fully shredded object                    
                                  |
+| `{"event_type": "login", "event_ts": 1729794146402, "email": 
"[email protected]"}` | `{"email": "[email protected]"}`   | non-null      | null 
                          | `login`                              | null         
                | 1729794146402                      | Partially shredded 
object                                                  |
+| `{"error_msg": "malformed: ..."}`                                            
     | `{"error_msg", "malformed: ..."}` | non-null      | null                 
          | null                                 | null                         
| null                               | Object with all shredded fields missing  
                                  |
+| `"malformed: not an object"`                                                 
     | `malformed: not an object`        | null          |                      
          |                                      |                              
|                                    | Not an object (stored as Variant string) 
                                  |
+| `{"event_ts": 1729794240241, "click": "_button"}`                            
     | `{"click": "_button"}`            | non-null      | null                 
          | null                                 | null                         
| 1729794240241                      | Field `event_type` is missing            
                                  |
+| `{"event_type": null, "event_ts": 1729794954163}`                            
     | null                              | non-null      | `00` (field exists, 
is null)   | null                                 | null                        
 | 1729794954163                      | Field `event_type` is present and is 
null                                  |
+| `{"event_type": "noop", "event_ts": "2024-10-24"}`                           
     | null                              | non-null      | null                 
          | `noop`                               | `"2024-10-24"`               
| null                               | Field `event_ts` is present but not a 
timestamp                            |
+| `{ }`                                                                        
     | null                              | non-null      | null                 
          | null                                 | null                         
| null                               | Object is present but empty              
                                  |
+| null                                                                         
     | `00` (null)                       | null          |                      
          |                                      |                              
|                                    | Object/value is null                     
                                  |
+| missing                                                                      
     | null                              | null          |                      
          |                                      |                              
|                                    | Object/value is missing                  
                                  |
+| INVALID: `{"event_type": "login", "event_ts": 1729795057774}`                
     | `{"event_type": "login"}`         | non-null      | null                 
          | `login`                              | null                         
| 1729795057774                      | INVALID: Shredded field is present in 
`value`                              |
+| INVALID: `{"event_type": "login"}`                                           
     | `{"event_type": "login"}`         | null          |                      
          |                                      |                              
|                                    | INVALID: Shredded field is present in 
`value`, while `typed_value` is null |
+| INVALID: `"a"`                                                               
     | `"a"`                             | non-null      | null                 
          | null                                 | null                         
| null                               | INVALID: `typed_value` is present and 
`value` is not an object             |
+| INVALID: `{}`                                                                
     | `02 00` (object with 0 fields)    | null          |                      
          |                                      |                              
|                                    | INVALID: `typed_value` is null for 
object                                  |
 
 Invalid cases in the table above must not be produced by writers.
 Readers must return an object when `typed_value` is non-null containing the 
shredded fields.

Reply via email to