rdblue commented on a change in pull request #3482:
URL: https://github.com/apache/iceberg/pull/3482#discussion_r765344052
##########
File path: site/docs/spark-queries.md
##########
@@ -191,57 +192,166 @@ join prod.db.table.snapshots s
on h.snapshot_id = s.snapshot_id
order by made_current_at
```
-```text
-+-------------------------+-----------+----------------+---------------------+----------------------------------+
-| made_current_at | operation | snapshot_id | is_current_ancestor |
summary[spark.app.id] |
-+-------------------------+-----------+----------------+---------------------+----------------------------------+
+
+<div class="markdown-table-container" markdown="block">
+| made_current_at | operation | snapshot_id | is_current_ancestor |
summary[spark.app.id] |
+| -- | -- | -- | -- | -- |
| 2019-02-08 03:29:51.215 | append | 57897183625154 | true |
application_1520379288616_155055 |
| 2019-02-09 16:24:30.13 | delete | 29641004024753 | false |
application_1520379288616_151109 |
| 2019-02-09 16:32:47.336 | append | 57897183625154 | true |
application_1520379288616_155055 |
| 2019-02-08 03:47:55.948 | overwrite | 51792995261850 | true |
application_1520379288616_152431 |
-+-------------------------+-----------+----------------+---------------------+----------------------------------+
-```
+</div>
### Files
-To show a table's data files and each file's metadata, run:
+To show a table's current data files and each file's metadata, run:
```sql
SELECT * FROM prod.db.table.files
```
-```text
-+-------------------------------------------------------------------------+-------------+--------------+--------------------+--------------------+------------------+-------------------+------------------+-----------------+-----------------+--------------+---------------+
-| file_path |
file_format | record_count | file_size_in_bytes | column_sizes |
value_counts | null_value_counts | nan_value_counts | lower_bounds |
upper_bounds | key_metadata | split_offsets |
-+-------------------------------------------------------------------------+-------------+--------------+--------------------+--------------------+------------------+-------------------+------------------+-----------------+-----------------+--------------+---------------+
+
+<div class="markdown-table-container" markdown="block">
+| file_path | file_format | record_count | file_size_in_bytes | column_sizes |
value_counts | null_value_counts | nan_value_counts | lower_bounds |
upper_bounds | key_metadata | split_offsets |
+| -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- |
| s3:/.../table/data/00000-3-8d6d60e8-d427-4809-bcf0-f5d45a4aad96.parquet |
PARQUET | 1 | 597 | [1 -> 90, 2 -> 62] | [1 -> 1,
2 -> 1] | [1 -> 0, 2 -> 0] | [] | [1 -> , 2 -> c] | [1 -> , 2 ->
c] | null | [4] |
| s3:/.../table/data/00001-4-8d6d60e8-d427-4809-bcf0-f5d45a4aad96.parquet |
PARQUET | 1 | 597 | [1 -> 90, 2 -> 62] | [1 -> 1,
2 -> 1] | [1 -> 0, 2 -> 0] | [] | [1 -> , 2 -> b] | [1 -> , 2 ->
b] | null | [4] |
| s3:/.../table/data/00002-5-8d6d60e8-d427-4809-bcf0-f5d45a4aad96.parquet |
PARQUET | 1 | 597 | [1 -> 90, 2 -> 62] | [1 -> 1,
2 -> 1] | [1 -> 0, 2 -> 0] | [] | [1 -> , 2 -> a] | [1 -> , 2 ->
a] | null | [4] |
-+-------------------------------------------------------------------------+-------------+--------------+--------------------+--------------------+------------------+-------------------+------------------+-----------------+-----------------+--------------+---------------+
-```
+</div>
### Manifests
-To show a table's file manifests and each file's metadata, run:
+To show a table's current file manifests and each file's metadata, run:
```sql
SELECT * FROM prod.db.table.manifests
```
-```text
-+----------------------------------------------------------------------+--------+-------------------+---------------------+------------------------+---------------------------+--------------------------+--------------------------------------+
-| path |
length | partition_spec_id | added_snapshot_id | added_data_files_count |
existing_data_files_count | deleted_data_files_count | partition_summaries
|
-+----------------------------------------------------------------------+--------+-------------------+---------------------+------------------------+---------------------------+--------------------------+--------------------------------------+
+
+<div class="markdown-table-container" markdown="block">
+| path | length | partition_spec_id | added_snapshot_id |
added_data_files_count | existing_data_files_count | deleted_data_files_count |
partition_summaries |
+| -- | -- | -- | -- | -- | -- | -- | -- |
| s3://.../table/metadata/45b5290b-ee61-4788-b324-b1e2735c0e10-m0.avro | 4479
| 0 | 6668963634911763636 | 8 | 0
| 0 |
[[false,null,2019-05-13,2019-05-15]] |
-+----------------------------------------------------------------------+--------+-------------------+---------------------+------------------------+---------------------------+--------------------------+--------------------------------------+
+</div>
+
+!!! Note
+ 1. Fields within `partition_summaries` column of the manifests table
correspond to `field_summary` structs within [manifest
list](./spec.md#manifest-lists), with the following order:
+ - `contains_null`
+ - `contains_nan`
+ - `lower_bound`
+ - `upper_bound`
+ 2. `contains_nan` could return null, which indicates that this information
is not available from files' metadata.
+ This usually occurs when reading from V1 table, where `contains_nan`
is not populated.
+
+### Partitions
+
+To show a table's current partitions
+
+```sql
+SELECT * FROM prod.db.table.partitions
```
-Note:
-1. Fields within `partition_summaries` column of the manifests table
correspond to `field_summary` structs within [manifest
list](./spec.md#manifest-lists), with the following order:
- - `contains_null`
- - `contains_nan`
- - `lower_bound`
- - `upper_bound`
-2. `contains_nan` could return null, which indicates that this information is
not available from files' metadata.
- This usually occurs when reading from V1 table, where `contains_nan` is not
populated.
+<div class="markdown-table-container" markdown="block">
+| partition | record_count | file_count |
+| -- | -- | -- |
+| {20211001, 11}| 1| 1|
+| {20211002, 11}| 1| 1|
+| {20211001, 10}| 1| 1|
+| {20211002, 10}| 1| 1|
+</div>
+
+### All Metadata
+
+!!! Note
+ The metadata is readable from any snapshot currently tracked by the table
+
+!!! WARNING
+ The table's metadata may return **duplicate** rows
+
+#### All Data Files
+
+To show a table's valid data files and each file's metadata, run:
+
+```sql
+SELECT * FROM prod.db.table.all_data_files
+```
+
+<div class="markdown-table-container" markdown="block">
+| content | file_path | file_format | partition | record_count |
file_size_in_bytes | column_sizes| value_counts | null_value_counts |
nan_value_counts| lower_bounds|
upper_bounds|key_metadata|split_offsets|equality_ids|sort_order_id|
+| -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- |
-- |
+| 0|s3:.../dt=20210102/xxx.parquet| PARQUET|{20210102}| 14|
2444|{1 -> 94, 2 -> 17}|{1 -> 14, 2 -> 14}| {1 -> 0, 2 -> 0}|
{}|{1 -> 1, 2 -> 20210102}|{1 -> 2, 2 -> 20210102}| null|
[4]| null| 0|
+| 0|s3:.../dt=20210103/xxx.parquet| PARQUET|{20210103}| 14|
2444|{1 -> 94, 2 -> 17}|{1 -> 14, 2 -> 14}| {1 -> 0, 2 -> 0}|
{}|{1 -> 1, 2 -> 20210103}|{1 -> 3, 2 -> 20210103}| null|
[4]| null| 0|
+| 0|s3:.../dt=20210104/xxx.parquet| PARQUET|{20210104}| 14|
2444|{1 -> 94, 2 -> 17}|{1 -> 14, 2 -> 14}| {1 -> 0, 2 -> 0}|
{}|{1 -> 1, 2 -> 20210104}|{1 -> 3, 2 -> 20210104}| null|
[4]| null| 0|
+</div>
+
+#### All Manifests
+
+To show a table's valid file manifests and each file's metadata, run:
+
+```sql
+SELECT * FROM prod.db.table.all_manifests
+```
+
+<div class="markdown-table-container" markdown="block">
+| path | length | partition_spec_id | added_snapshot_id |
added_data_files_count | existing_data_files_count | deleted_data_files_count|
partition_summaries|
+| -- | -- | -- | -- | -- | -- | -- | -- |
+| s3:/.../table/data/dt=20210102/xxx.parquet | 6376 | 0 | 6272782676904868561
| 2 | 0 | 0 |[{false, false, 10, 11}, {false, false, 20210101, 20210101}]|
+</div>
+
+!!! Note
+ 1. A valid manifest file is one that is referenced from any snapshot
currently tracked by the table.
+ 2. Fields within `partition_summaries` column of the manifests table
correspond to `field_summary` structs within [manifest
list](./spec.md#manifest-lists), with the following order:
+ - `contains_null`
+ - `contains_nan`
+ - `lower_bound`
+ - `upper_bound`
+ 3. `contains_nan` could return null, which indicates that this information
is not available from files' metadata.
+ This usually occurs when reading from V1 table, where `contains_nan`
is not populated.
+
+### Advanced Usage
+
+#### Entries
+
+To show a table's current manifest entries as rows, for both delete and data
files, run:
+
+```sql
+SELECT * FROM prod.db.table.entries
+```
+
+<div class="markdown-table-container" markdown="block">
+| status|snapshot_id | sequence_number | data_file |
+| -- | -- | -- | -- |
+|0 |7462238160765527919|0 |{0,
s3://.../dt=20211001/age=10/00000-0-38e4b886-b445-40a9-8db0-58653a331aba-00001.parquet,
PARQUET, {20211001, 10}, 1, 1132, {1 -> 47, 2 -> 52, 3 -> 47, 4 -> 55}, {1 ->
1, 2 -> 1, 3 -> 1, 4 -> 1}, {1 -> 0, 2 -> 0, 3 -> 0, 4 -> 0}, {}, {1 ->