rdblue commented on a change in pull request #4301:
URL: https://github.com/apache/iceberg/pull/4301#discussion_r823197799



##########
File path: format/spec.md
##########
@@ -193,10 +193,38 @@ Notes:
 
 For details on how to serialize a schema to JSON, see Appendix C.
 
+#### Default value
+Default values can be assigned to top-level columns or nested fields. Default 
values are used during schema evolution when adding a new column. The default 
value is used to read rows belonging to the files that lack the column or 
nested field prior to the schema evolution.
+
+Currently, when a default value for a column or nested field is set, it is 
considered an incompatible change to change it to another value. However, 
changing default values is allowed when calling the `allowIncompatibleChanges` 
API explicitly. Changing default values is discouraged since the occurrence of 
a rewrite may determine whether the new or old default value is returned during 
the read.
+
+Default values are encoded in JSON format. The representation depends on the 
type of the corresponding field. The mapping of types and their corresponding 
default value JSON representation is described in the following table.
+
+| type               | json type     | example                                
| note                                                                          
                                                                                
                                                                               |
+|--------------------|---------------|----------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| **`boolean`**      | **`boolean`** | true                                   
|                                                                               
                                                                                
                                                                               |
+| **`int`**          | **`number`**  | 1                                      
|                                                                               
                                                                                
                                                                               |
+| **`long`**         | **`number`**  | 1                                      
|                                                                               
                                                                                
                                                                               |
+| **`float`**        | **`number`**  | 1.1                                    
|                                                                               
                                                                                
                                                                               |
+| **`double`**       | **`number`**  | 1.1                                    
|                                                                               
                                                                                
                                                                               |
+| **`decimal(P,S)`** | **`string`**  | "0x3162"                               
| we use hexadecimal byte literals to encode bytes, with prefix `0x`, the byte 
array contain the two's-complement representation of the `unscaled` integer 
value in big-endian byte order, the actual value will be `unscaled * 10 ^ 
(-scale)` |

Review comment:
       In Iceberg, we avoid using personal pronouns ("we" or "I") in specs and 
docs. It is unclear who "we" is, and avoiding it ends up producing more direct 
documentation that is easily understood.
   
   For example, here you should state that the values is "Stored as the 
two's-complement big-endian binary using the minimum number of bytes, converted 
to a hexadecimal string prefixed by 0x".
   
   You should use wording mostly taken from Appendix D and adapted for JSON.
   
   @RussellSpitzer, what do you think about base64 encoding or other 
representations? Do we care about the size here?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to