This is an automated email from the ASF dual-hosted git repository.
stevenwu pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/iceberg.git
The following commit(s) were added to refs/heads/main by this push:
new ab580b9955 Spec: remove the JSON spec for content file and file scan
task sections. (#9771)
ab580b9955 is described below
commit ab580b9955ade2c4a755d5b8e150058088a48c2a
Author: Steven Zhen Wu <[email protected]>
AuthorDate: Mon Jul 15 21:02:35 2024 -0700
Spec: remove the JSON spec for content file and file scan task sections.
(#9771)
They shouldn't be part of the core table spec although the JSON serializer
is valuable for FileScanTask serialization. See discussion thread for more
context: https://lists.apache.org/thread/2ty27yx4q0zlqd5h71cyyhb5k47yf9bv
---
format/spec.md | 36 ------------------------------------
1 file changed, 36 deletions(-)
diff --git a/format/spec.md b/format/spec.md
index 9a3c16e3ac..dd4e901f37 100644
--- a/format/spec.md
+++ b/format/spec.md
@@ -1230,42 +1230,6 @@ Example
] } ]
```
-### Content File (Data and Delete) Serialization
-
-Content file (data or delete) is serialized as a JSON object according to the
following table.
-
-| Metadata field |JSON representation|Example|
-|--------------------------|--- |--- |
-| **`spec-id`** |`JSON int`|`1`|
-| **`content`** |`JSON string`|`DATA`, `POSITION_DELETES`,
`EQUALITY_DELETES`|
-| **`file-path`** |`JSON string`|`"s3://b/wh/data.db/table"`|
-| **`file-format`** |`JSON string`|`AVRO`, `ORC`, `PARQUET`|
-| **`partition`** |`JSON object: Partition data tuple using partition
field ids for the struct field ids`|`{"1000":1}`|
-| **`record-count`** |`JSON long`|`1`|
-| **`file-size-in-bytes`** |`JSON long`|`1024`|
-| **`column-sizes`** |`JSON object: Map from column id to the total size
on disk of all regions that store the
column.`|`{"keys":[3,4],"values":[100,200]}`|
-| **`value-counts`** |`JSON object: Map from column id to number of
values in the column (including null and NaN
values)`|`{"keys":[3,4],"values":[90,180]}`|
-| **`null-value-counts`** |`JSON object: Map from column id to number of null
values in the column`|`{"keys":[3,4],"values":[10,20]}`|
-| **`nan-value-counts`** |`JSON object: Map from column id to number of NaN
values in the column`|`{"keys":[3,4],"values":[0,0]}`|
-| **`lower-bounds`** |`JSON object: Map from column id to lower bound
binary in the column serialized as hexadecimal
string`|`{"keys":[3,4],"values":["01000000","02000000"]}`|
-| **`upper-bounds`** |`JSON object: Map from column id to upper bound
binary in the column serialized as hexadecimal
string`|`{"keys":[3,4],"values":["05000000","0A000000"]}`|
-| **`key-metadata`** |`JSON string: Encryption key metadata binary
serialized as hexadecimal string`|`00000000000000000000000000000000`|
-| **`split-offsets`** |`JSON list of long: Split offsets for the data
file`|`[128,256]`|
-| **`equality-ids`** |`JSON list of int: Field ids used to determine row
equality in equality delete files`|`[1]`|
-| **`sort-order-id`** |`JSON int`|`1`|
-
-### File Scan Task Serialization
-
-File scan task is serialized as a JSON object according to the following table.
-
-| Metadata field |JSON representation|Example|
-|--------------------------|--- |--- |
-| **`schema`** |`JSON object`|`See above, read schemas instead`|
-| **`spec`** |`JSON object`|`See above, read partition specs
instead`|
-| **`data-file`** |`JSON object`|`See above, read content file instead`|
-| **`delete-files`** |`JSON list of objects`|`See above, read content file
instead`|
-| **`residual-filter`** |`JSON object: residual filter
expression`|`{"type":"eq","term":"id","value":1}`|
-
## Appendix D: Single-value serialization
### Binary single-value serialization