rzhang10 commented on a change in pull request #4301: URL: https://github.com/apache/iceberg/pull/4301#discussion_r834793135
##########
File path: format/spec.md
##########
@@ -951,24 +958,45 @@ Types are serialized according to this table:
|Type|JSON representation|Example|
|--- |--- |--- |
-|**`boolean`**|`JSON string: "boolean"`|`"boolean"`|
-|**`int`**|`JSON string: "int"`|`"int"`|
-|**`long`**|`JSON string: "long"`|`"long"`|
-|**`float`**|`JSON string: "float"`|`"float"`|
-|**`double`**|`JSON string: "double"`|`"double"`|
-|**`date`**|`JSON string: "date"`|`"date"`|
-|**`time`**|`JSON string: "time"`|`"time"`|
-|**`timestamp without zone`**|`JSON string: "timestamp"`|`"timestamp"`|
-|**`timestamp with zone`**|`JSON string: "timestamptz"`|`"timestamptz"`|
-|**`string`**|`JSON string: "string"`|`"string"`|
-|**`uuid`**|`JSON string: "uuid"`|`"uuid"`|
-|**`fixed(L)`**|`JSON string: "fixed[<L>]"`|`"fixed[16]"`|
-|**`binary`**|`JSON string: "binary"`|`"binary"`|
-|**`decimal(P, S)`**|`JSON string: "decimal(<P>,<S>)"`|`"decimal(9,2)"`,<br
/>`"decimal(9, 2)"`|
-|**`struct`**|`JSON object: {`<br /> `"type": "struct",`<br
/> `"fields": [ {`<br /> `"id": <field id
int>,`<br /> `"name": <name string>,`<br
/> `"required": <boolean>,`<br
/> `"type": <type JSON>,`<br
/> `"doc": <comment string>`<br
/> `}, ...`<br /> `] }`|`{`<br
/> `"type": "struct",`<br /> `"fields": [ {`<br
/> `"id": 1,`<br /> `"name":
"id",`<br /> `"required": true,`<br
/> `"type": "uuid"`<br /> `}, {`<br
/> `"id": 2,`<br /> `"name":
"data",`<br /> `"required": false,`<br
/> `"type": {`<br
/> `"type": "list",`<br
/> `...`<br />
`}`<br /> `} ]`<br />`}`|
-|**`list`**|`JSON object: {`<br /> `"type": "list",`<br
/> `"element-id": <id int>,`<br /> `"element-required":
<bool>`<br /> `"element": <type JSON>`<br />`}`|`{`<br
/> `"type": "list",`<br /> `"element-id": 3,`<br
/> `"element-required": true,`<br /> `"element":
"string"`<br />`}`|
-|**`map`**|`JSON object: {`<br /> `"type": "map",`<br
/> `"key-id": <key id int>,`<br /> `"key": <type
JSON>,`<br /> `"value-id": <val id int>,`<br
/> `"value-required": <bool>`<br /> `"value": <type
JSON>`<br />`}`|`{`<br /> `"type": "map",`<br
/> `"key-id": 4,`<br /> `"key": "string",`<br
/> `"value-id": 5,`<br /> `"value-required": false,`<br
/> `"value": "double"`<br />`}`|
-
+| **`boolean`** | `JSON string: "boolean"`
| `"boolean"`
|
+| **`int`** | `JSON string: "int"`
| `"int"`
|
+| **`long`** | `JSON string: "long"`
| `"long"`
|
+| **`float`** | `JSON string: "float"`
| `"float"`
|
+| **`double`** | `JSON string: "double"`
| `"double"`
|
+| **`date`** | `JSON string: "date"`
| `"date"`
|
+| **`time`** | `JSON string: "time"`
| `"time"`
|
+| **`timestamp without zone`** | `JSON string: "timestamp"`
| `"timestamp"`
|
+| **`timestamp with zone`** | `JSON string: "timestamptz"`
| `"timestamptz"`
|
+| **`string`** | `JSON string: "string"`
| `"string"`
|
+| **`uuid`** | `JSON string: "uuid"`
| `"uuid"`
|
+| **`fixed(L)`** | `JSON string: "fixed[<L>]"`
| `"fixed[16]"`
|
+| **`binary`** | `JSON string: "binary"`
| `"binary"`
|
+| **`decimal(P, S)`** | `JSON string: "decimal(<P>,<S>)"`
| `"decimal(9,2)"`,<br
/>`"decimal(9, 2)"`
|
+| **`struct`** | `JSON object: {`<br /> `"type":
"struct",`<br /> `"fields": [ {`<br /> `"id":
<field id int>,`<br /> `"name": <name string>,`<br
/> `"required": <boolean>,`<br
/> `"type": <type JSON>,`<br
/> `"doc": <comment string>,`<br
/> `"default": <JSON encoding of default value>`<br
/> `}, ...`<br /> `] }` | `{`<br
/> `"type": "struct",`<br /> `"fields": [ {`<br
/> `"id": 1,`<br /> `"name":
"id",`<br /> `"required": true,`<br
/> `"type": "uuid",`<br
/> `"default":
"0db3e2a8-9d1d-42b9-aa7b-74ebe558dceb"`<br /> `}, {`<br
/> `"id": 2,`<br /> `"name":
"data",`<br />
`"required": false,`<br /> `"type": {`<br
/> `"type": "list",`<br
/> `...`<br /> `}`<br
/> `} ]`<br />`}` |
+| **`list`** | `JSON object: {`<br /> `"type":
"list",`<br /> `"element-id": <id int>,`<br
/> `"element-required": <bool>`<br /> `"element": <type
JSON>`<br />`}`
|
`{`<br /> `"type": "list",`<br /> `"element-id": 3,`<br
/> `"element-required": true,`<br /> `"element":
"string"`<br />`}`
|
+| **`map`** | `JSON object: {`<br /> `"type":
"map",`<br /> `"key-id": <key id int>,`<br /> `"key":
<type JSON>,`<br /> `"value-id": <val id int>,`<br
/> `"value-required": <bool>`<br /> `"value": <type
JSON>`<br />`}`
|
`{`<br /> `"type": "map",`<br /> `"key-id": 4,`<br
/> `"key": "string",`<br /> `"value-id": 5,`<br
/> `"value-required": false,`<br /> `"value":
"double"`<br />`}`
|
+
+For default values, the serialization depends on the type of the corresponding
column or nested field. The mapping of types and their corresponding default
value JSON serialization is described in the following table:
+
+| Type | Json type | Example |
Note
|
+|--------------------|-------------------|----------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| **`boolean`** | **`boolean`** | `true` |
|
+| **`int`** | **`json int`** | `1` |
|
+| **`long`** | **`json long`** | `1` |
|
+| **`float`** | **`json float`** | `1.1` |
|
+| **`double`** | **`json double`** | `1.1` |
|
+| **`decimal(P,S)`** | **`string`** | `"0x3162"` |
Stores the unscaled value, as the two's-complement big-endian binary using the
minimum number of bytes, converted to a hexadecimal string prefixed by `0x`
|
+| **`date`** | **`json int`** | `19054` |
Stores days from the 1970-01-01
|
+| **`time`** | **`json long`** | `36000000000` |
Stores microseconds from midnight
|
+| **`timestamp`** | **`json long`** | `1646277378000000` |
Stores microseconds from 1970-01-01 00:00:00.000000
|
+| **`timestamptz`** | **`json long`** | `1646277378000000` |
Stores microseconds from 1970-01-01 00:00:00.000000 UTC
|
+| **`string`** | **`string`** | `"foo"` |
|
+| **`uuid`** | **`string`** |
`"eb26bdb1-a1d8-4aa6-990e-da940875492c"` | Stores the lowercase uuid string
|
+| **`fixed(L)`** | **`string`** | `"0x3162"` |
Stored as a hexadecimal byte literal string, prefixex by `0x`
|
+| **`binary`** | **`string`** | `"0x3162"` |
Stored as a hexadecimal byte literal string, prefixex by `0x`
|
+| **`struct`** | **`object`** | `{"a": 1, "foo": "bar"}` |
Use a JSON map to represent struct data, the keys are the nested fields' names
in the struct schema, and the values are value literals of corresponding
fields' type
|
Review comment:
> My preference is to encode the default value that the user supplied
and add the defaults of child fields to that value when fields are not present.
For initial default values, we would use initial defaults. And for write
defaults we would use write defaults.
Are you saying if the a default value is missing for a outer struct, its
child default value should be automatically propagated to its own level ?
I feel adding this propagation logic (either traverse up the tree, or down
the tree) will be too complicated. I have a different idea, how about we treat
all the default value as independent?
Let's say there is a struct called `foo` and its one child called `foo.bar`.
If the user is to query and project `foo`, i.e. `select foo`, we use the
default value at the `foo` level.
If the user is to query and project `foo.bar`, i.e. `select foo.bar`, we use
the default value at the `foo.bar` level.
Basically, I'm saying the default value of `foo` and `foo.bar` are
independent, and users also set those 2 values independently.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
