(iceberg) branch main updated: Spec: remove the JSON spec for content file and file scan task sections. (#9771)

stevenwu Mon, 15 Jul 2024 21:03:03 -0700

This is an automated email from the ASF dual-hosted git repository.

stevenwu pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/iceberg.git



The following commit(s) were added to refs/heads/main by this push:
     new ab580b9955 Spec: remove the JSON spec for content file and file scan 
task sections. (#9771)
ab580b9955 is described below

commit ab580b9955ade2c4a755d5b8e150058088a48c2a
Author: Steven Zhen Wu <[email protected]>
AuthorDate: Mon Jul 15 21:02:35 2024 -0700

    Spec: remove the JSON spec for content file and file scan task sections. 
(#9771)
    
    They shouldn't be part of the core table spec although the JSON serializer 
is valuable for FileScanTask serialization. See discussion thread for more 
context: https://lists.apache.org/thread/2ty27yx4q0zlqd5h71cyyhb5k47yf9bv
---
 format/spec.md | 36 ------------------------------------
 1 file changed, 36 deletions(-)

diff --git a/format/spec.md b/format/spec.md
index 9a3c16e3ac..dd4e901f37 100644
--- a/format/spec.md
+++ b/format/spec.md
@@ -1230,42 +1230,6 @@ Example
      ] } ]
 ```
 
-### Content File (Data and Delete) Serialization
-
-Content file (data or delete) is serialized as a JSON object according to the 
following table.
-
-| Metadata field           |JSON representation|Example|
-|--------------------------|--- |--- |
-| **`spec-id`**            |`JSON int`|`1`|
-| **`content`**            |`JSON string`|`DATA`, `POSITION_DELETES`, 
`EQUALITY_DELETES`|
-| **`file-path`**          |`JSON string`|`"s3://b/wh/data.db/table"`|
-| **`file-format`**        |`JSON string`|`AVRO`, `ORC`, `PARQUET`|
-| **`partition`**          |`JSON object: Partition data tuple using partition 
field ids for the struct field ids`|`{"1000":1}`|
-| **`record-count`**       |`JSON long`|`1`|
-| **`file-size-in-bytes`** |`JSON long`|`1024`|
-| **`column-sizes`**       |`JSON object: Map from column id to the total size 
on disk of all regions that store the 
column.`|`{"keys":[3,4],"values":[100,200]}`|
-| **`value-counts`**       |`JSON object: Map from column id to number of 
values in the column (including null and NaN 
values)`|`{"keys":[3,4],"values":[90,180]}`|
-| **`null-value-counts`**  |`JSON object: Map from column id to number of null 
values in the column`|`{"keys":[3,4],"values":[10,20]}`|
-| **`nan-value-counts`**   |`JSON object: Map from column id to number of NaN 
values in the column`|`{"keys":[3,4],"values":[0,0]}`|
-| **`lower-bounds`**       |`JSON object: Map from column id to lower bound 
binary in the column serialized as hexadecimal 
string`|`{"keys":[3,4],"values":["01000000","02000000"]}`|
-| **`upper-bounds`**       |`JSON object: Map from column id to upper bound 
binary in the column serialized as hexadecimal 
string`|`{"keys":[3,4],"values":["05000000","0A000000"]}`|
-| **`key-metadata`**       |`JSON string: Encryption key metadata binary 
serialized as hexadecimal string`|`00000000000000000000000000000000`|
-| **`split-offsets`**      |`JSON list of long: Split offsets for the data 
file`|`[128,256]`|
-| **`equality-ids`**       |`JSON list of int: Field ids used to determine row 
equality in equality delete files`|`[1]`|
-| **`sort-order-id`**      |`JSON int`|`1`|
-
-### File Scan Task Serialization
-
-File scan task is serialized as a JSON object according to the following table.
-
-| Metadata field       |JSON representation|Example|
-|--------------------------|--- |--- |
-| **`schema`**          |`JSON object`|`See above, read schemas instead`|
-| **`spec`**            |`JSON object`|`See above, read partition specs 
instead`|
-| **`data-file`**       |`JSON object`|`See above, read content file instead`|
-| **`delete-files`**    |`JSON list of objects`|`See above, read content file 
instead`|
-| **`residual-filter`** |`JSON object: residual filter 
expression`|`{"type":"eq","term":"id","value":1}`|
-
 ## Appendix D: Single-value serialization
 
 ### Binary single-value serialization

(iceberg) branch main updated: Spec: remove the JSON spec for content file and file scan task sections. (#9771)

Reply via email to