shardulm94 commented on a change in pull request #1425:
URL: https://github.com/apache/iceberg/pull/1425#discussion_r486839711
##########
File path: site/docs/spark.md
##########
@@ -619,13 +619,30 @@ To show a table's data files and each file's metadata,
run:
SELECT * FROM prod.db.table.files
```
```text
-+-------------------------------------------------------------------------+-------------+--------------+--------------------+--------------------+------------------+-------------------+-----------------+-----------------+--------------+---------------+
-| file_path |
file_format | record_count | file_size_in_bytes | column_sizes |
value_counts | null_value_counts | lower_bounds | upper_bounds |
key_metadata | split_offsets |
-+-------------------------------------------------------------------------+-------------+--------------+--------------------+--------------------+------------------+-------------------+-----------------+-----------------+--------------+---------------+
-| s3:/.../table/data/00000-3-8d6d60e8-d427-4809-bcf0-f5d45a4aad96.parquet |
PARQUET | 1 | 597 | [1 -> 90, 2 -> 62] | [1 -> 1,
2 -> 1] | [1 -> 0, 2 -> 0] | [1 -> , 2 -> c] | [1 -> , 2 -> c] | null
| [4] |
-| s3:/.../table/data/00001-4-8d6d60e8-d427-4809-bcf0-f5d45a4aad96.parquet |
PARQUET | 1 | 597 | [1 -> 90, 2 -> 62] | [1 -> 1,
2 -> 1] | [1 -> 0, 2 -> 0] | [1 -> , 2 -> b] | [1 -> , 2 -> b] | null
| [4] |
-| s3:/.../table/data/00002-5-8d6d60e8-d427-4809-bcf0-f5d45a4aad96.parquet |
PARQUET | 1 | 597 | [1 -> 90, 2 -> 62] | [1 -> 1,
2 -> 1] | [1 -> 0, 2 -> 0] | [1 -> , 2 -> a] | [1 -> , 2 -> a] | null
| [4] |
-+-------------------------------------------------------------------------+-------------+--------------+--------------------+--------------------+------------------+-------------------+-----------------+-----------------+--------------+---------------+
++---------+-------------------------------------------------------------------------+-------------+--------------+--------------------+--------------------+------------------+-------------------+-----------------+-----------------+--------------+---------------+--------------+
+| content | file_path
| file_format | record_count | file_size_in_bytes | column_sizes |
value_counts | null_value_counts | lower_bounds | upper_bounds |
key_metadata | split_offsets | equality_ids |
++---------+-------------------------------------------------------------------------+-------------+--------------+--------------------+--------------------+------------------+-------------------+-----------------+-----------------+--------------+---------------+--------------+
+| 0 |
s3:/.../table/data/00000-3-8d6d60e8-d427-4809-bcf0-f5d45a4aad96.parquet |
PARQUET | 1 | 597 | [1 -> 90, 2 -> 62] | [1 -> 1,
2 -> 1] | [1 -> 0, 2 -> 0] | [1 -> , 2 -> c] | [1 -> , 2 -> c] | null
| [4] | null |
+| 0 |
s3:/.../table/data/00001-4-8d6d60e8-d427-4809-bcf0-f5d45a4aad96.parquet |
PARQUET | 1 | 597 | [1 -> 90, 2 -> 62] | [1 -> 1,
2 -> 1] | [1 -> 0, 2 -> 0] | [1 -> , 2 -> b] | [1 -> , 2 -> b] | null
| [4] | null |
+| 0 |
s3:/.../table/data/00002-5-8d6d60e8-d427-4809-bcf0-f5d45a4aad96.parquet |
PARQUET | 1 | 597 | [1 -> 90, 2 -> 62] | [1 -> 1,
2 -> 1] | [1 -> 0, 2 -> 0] | [1 -> , 2 -> a] | [1 -> , 2 -> a] | null
| [4] | null |
++-----------------------------------------------------------------------------------+-------------+--------------+--------------------+--------------------+------------------+-------------------+-----------------+-----------------+--------------+---------------+--------------+
+```
Review comment:
I think the `content` column will be useful to differentiate between
data and delete files. But I do agree that the int column is not useful. We
should probably display the FileContent name here?
##########
File path: site/docs/spark.md
##########
@@ -619,13 +619,30 @@ To show a table's data files and each file's metadata,
run:
SELECT * FROM prod.db.table.files
```
```text
-+-------------------------------------------------------------------------+-------------+--------------+--------------------+--------------------+------------------+-------------------+-----------------+-----------------+--------------+---------------+
-| file_path |
file_format | record_count | file_size_in_bytes | column_sizes |
value_counts | null_value_counts | lower_bounds | upper_bounds |
key_metadata | split_offsets |
-+-------------------------------------------------------------------------+-------------+--------------+--------------------+--------------------+------------------+-------------------+-----------------+-----------------+--------------+---------------+
-| s3:/.../table/data/00000-3-8d6d60e8-d427-4809-bcf0-f5d45a4aad96.parquet |
PARQUET | 1 | 597 | [1 -> 90, 2 -> 62] | [1 -> 1,
2 -> 1] | [1 -> 0, 2 -> 0] | [1 -> , 2 -> c] | [1 -> , 2 -> c] | null
| [4] |
-| s3:/.../table/data/00001-4-8d6d60e8-d427-4809-bcf0-f5d45a4aad96.parquet |
PARQUET | 1 | 597 | [1 -> 90, 2 -> 62] | [1 -> 1,
2 -> 1] | [1 -> 0, 2 -> 0] | [1 -> , 2 -> b] | [1 -> , 2 -> b] | null
| [4] |
-| s3:/.../table/data/00002-5-8d6d60e8-d427-4809-bcf0-f5d45a4aad96.parquet |
PARQUET | 1 | 597 | [1 -> 90, 2 -> 62] | [1 -> 1,
2 -> 1] | [1 -> 0, 2 -> 0] | [1 -> , 2 -> a] | [1 -> , 2 -> a] | null
| [4] |
-+-------------------------------------------------------------------------+-------------+--------------+--------------------+--------------------+------------------+-------------------+-----------------+-----------------+--------------+---------------+
++---------+-------------------------------------------------------------------------+-------------+--------------+--------------------+--------------------+------------------+-------------------+-----------------+-----------------+--------------+---------------+--------------+
+| content | file_path
| file_format | record_count | file_size_in_bytes | column_sizes |
value_counts | null_value_counts | lower_bounds | upper_bounds |
key_metadata | split_offsets | equality_ids |
++---------+-------------------------------------------------------------------------+-------------+--------------+--------------------+--------------------+------------------+-------------------+-----------------+-----------------+--------------+---------------+--------------+
+| 0 |
s3:/.../table/data/00000-3-8d6d60e8-d427-4809-bcf0-f5d45a4aad96.parquet |
PARQUET | 1 | 597 | [1 -> 90, 2 -> 62] | [1 -> 1,
2 -> 1] | [1 -> 0, 2 -> 0] | [1 -> , 2 -> c] | [1 -> , 2 -> c] | null
| [4] | null |
+| 0 |
s3:/.../table/data/00001-4-8d6d60e8-d427-4809-bcf0-f5d45a4aad96.parquet |
PARQUET | 1 | 597 | [1 -> 90, 2 -> 62] | [1 -> 1,
2 -> 1] | [1 -> 0, 2 -> 0] | [1 -> , 2 -> b] | [1 -> , 2 -> b] | null
| [4] | null |
+| 0 |
s3:/.../table/data/00002-5-8d6d60e8-d427-4809-bcf0-f5d45a4aad96.parquet |
PARQUET | 1 | 597 | [1 -> 90, 2 -> 62] | [1 -> 1,
2 -> 1] | [1 -> 0, 2 -> 0] | [1 -> , 2 -> a] | [1 -> , 2 -> a] | null
| [4] | null |
++-----------------------------------------------------------------------------------+-------------+--------------+--------------------+--------------------+------------------+-------------------+-----------------+-----------------+--------------+---------------+--------------+
+```
+
+### Entries
+
+To show a table's manifest entries, run:
+
+```sql
+SELECT * FROM prod.db.table.entries
+```
+```text
++--------+----------------+-----------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
+| status | snapshot_id | sequence_number | data_file
|
++--------+----------------+-----------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
+| 1 | 57897183625154 | 0 | [0,
s3:/.../table/data/00000-3-8d6d60e8-d427-4809-bcf0-f5d45a4aad96.parquet,
PARQUET, 1, 597, [1 -> 90, 2 -> 62], [1 -> 1, 2 -> 1], [1 -> 0, 2 -> 0], [1 ->
, 2 -> c], [1 -> , 2 -> c], null, [4], null] |
+| 1 | 57897183625154 | 0 | [0,
s3:/.../table/data/00001-4-8d6d60e8-d427-4809-bcf0-f5d45a4aad96.parquet,
PARQUET, 1, 597, [1 -> 90, 2 -> 62], [1 -> 1, 2 -> 1], [1 -> 0, 2 -> 0], [1 ->
, 2 -> b], [1 -> , 2 -> b], null, [4], null] |
+| 1 | 57897183625154 | 0 | [0,
s3:/.../table/data/00002-5-8d6d60e8-d427-4809-bcf0-f5d45a4aad96.parquet,
PARQUET, 1, 597, [1 -> 90, 2 -> 62], [1 -> 1, 2 -> 1], [1 -> 0, 2 -> 0], [1 ->
, 2 -> a], [1 -> , 2 -> a], null, [4], null] |
++--------+----------------+-----------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
Review comment:
I think that makes sense. As I mentioned in #1365, I wasn't sure whether
this was intentional, so I just went ahead and added it. I can remove this. I
have found the `entries` table to be really useful for looking at the set of
files that have come between two snapshots (or in a set of snapshots).
##########
File path: site/docs/spark.md
##########
@@ -619,13 +619,30 @@ To show a table's data files and each file's metadata,
run:
SELECT * FROM prod.db.table.files
```
```text
-+-------------------------------------------------------------------------+-------------+--------------+--------------------+--------------------+------------------+-------------------+-----------------+-----------------+--------------+---------------+
-| file_path |
file_format | record_count | file_size_in_bytes | column_sizes |
value_counts | null_value_counts | lower_bounds | upper_bounds |
key_metadata | split_offsets |
-+-------------------------------------------------------------------------+-------------+--------------+--------------------+--------------------+------------------+-------------------+-----------------+-----------------+--------------+---------------+
-| s3:/.../table/data/00000-3-8d6d60e8-d427-4809-bcf0-f5d45a4aad96.parquet |
PARQUET | 1 | 597 | [1 -> 90, 2 -> 62] | [1 -> 1,
2 -> 1] | [1 -> 0, 2 -> 0] | [1 -> , 2 -> c] | [1 -> , 2 -> c] | null
| [4] |
-| s3:/.../table/data/00001-4-8d6d60e8-d427-4809-bcf0-f5d45a4aad96.parquet |
PARQUET | 1 | 597 | [1 -> 90, 2 -> 62] | [1 -> 1,
2 -> 1] | [1 -> 0, 2 -> 0] | [1 -> , 2 -> b] | [1 -> , 2 -> b] | null
| [4] |
-| s3:/.../table/data/00002-5-8d6d60e8-d427-4809-bcf0-f5d45a4aad96.parquet |
PARQUET | 1 | 597 | [1 -> 90, 2 -> 62] | [1 -> 1,
2 -> 1] | [1 -> 0, 2 -> 0] | [1 -> , 2 -> a] | [1 -> , 2 -> a] | null
| [4] |
-+-------------------------------------------------------------------------+-------------+--------------+--------------------+--------------------+------------------+-------------------+-----------------+-----------------+--------------+---------------+
++---------+-------------------------------------------------------------------------+-------------+--------------+--------------------+--------------------+------------------+-------------------+-----------------+-----------------+--------------+---------------+--------------+
+| content | file_path
| file_format | record_count | file_size_in_bytes | column_sizes |
value_counts | null_value_counts | lower_bounds | upper_bounds |
key_metadata | split_offsets | equality_ids |
++---------+-------------------------------------------------------------------------+-------------+--------------+--------------------+--------------------+------------------+-------------------+-----------------+-----------------+--------------+---------------+--------------+
+| 0 |
s3:/.../table/data/00000-3-8d6d60e8-d427-4809-bcf0-f5d45a4aad96.parquet |
PARQUET | 1 | 597 | [1 -> 90, 2 -> 62] | [1 -> 1,
2 -> 1] | [1 -> 0, 2 -> 0] | [1 -> , 2 -> c] | [1 -> , 2 -> c] | null
| [4] | null |
+| 0 |
s3:/.../table/data/00001-4-8d6d60e8-d427-4809-bcf0-f5d45a4aad96.parquet |
PARQUET | 1 | 597 | [1 -> 90, 2 -> 62] | [1 -> 1,
2 -> 1] | [1 -> 0, 2 -> 0] | [1 -> , 2 -> b] | [1 -> , 2 -> b] | null
| [4] | null |
+| 0 |
s3:/.../table/data/00002-5-8d6d60e8-d427-4809-bcf0-f5d45a4aad96.parquet |
PARQUET | 1 | 597 | [1 -> 90, 2 -> 62] | [1 -> 1,
2 -> 1] | [1 -> 0, 2 -> 0] | [1 -> , 2 -> a] | [1 -> , 2 -> a] | null
| [4] | null |
++-----------------------------------------------------------------------------------+-------------+--------------+--------------------+--------------------+------------------+-------------------+-----------------+-----------------+--------------+---------------+--------------+
+```
Review comment:
Seems like the files table only returns data files and not delete files,
so content will always be `0`. We can remove this field. Same with
`equalityIds`, this will always be null since the files table only covers data
files.
##########
File path: site/docs/spark.md
##########
@@ -619,13 +619,30 @@ To show a table's data files and each file's metadata,
run:
SELECT * FROM prod.db.table.files
```
```text
-+-------------------------------------------------------------------------+-------------+--------------+--------------------+--------------------+------------------+-------------------+-----------------+-----------------+--------------+---------------+
-| file_path |
file_format | record_count | file_size_in_bytes | column_sizes |
value_counts | null_value_counts | lower_bounds | upper_bounds |
key_metadata | split_offsets |
-+-------------------------------------------------------------------------+-------------+--------------+--------------------+--------------------+------------------+-------------------+-----------------+-----------------+--------------+---------------+
-| s3:/.../table/data/00000-3-8d6d60e8-d427-4809-bcf0-f5d45a4aad96.parquet |
PARQUET | 1 | 597 | [1 -> 90, 2 -> 62] | [1 -> 1,
2 -> 1] | [1 -> 0, 2 -> 0] | [1 -> , 2 -> c] | [1 -> , 2 -> c] | null
| [4] |
-| s3:/.../table/data/00001-4-8d6d60e8-d427-4809-bcf0-f5d45a4aad96.parquet |
PARQUET | 1 | 597 | [1 -> 90, 2 -> 62] | [1 -> 1,
2 -> 1] | [1 -> 0, 2 -> 0] | [1 -> , 2 -> b] | [1 -> , 2 -> b] | null
| [4] |
-| s3:/.../table/data/00002-5-8d6d60e8-d427-4809-bcf0-f5d45a4aad96.parquet |
PARQUET | 1 | 597 | [1 -> 90, 2 -> 62] | [1 -> 1,
2 -> 1] | [1 -> 0, 2 -> 0] | [1 -> , 2 -> a] | [1 -> , 2 -> a] | null
| [4] |
-+-------------------------------------------------------------------------+-------------+--------------+--------------------+--------------------+------------------+-------------------+-----------------+-----------------+--------------+---------------+
++---------+-------------------------------------------------------------------------+-------------+--------------+--------------------+--------------------+------------------+-------------------+-----------------+-----------------+--------------+---------------+--------------+
+| content | file_path
| file_format | record_count | file_size_in_bytes | column_sizes |
value_counts | null_value_counts | lower_bounds | upper_bounds |
key_metadata | split_offsets | equality_ids |
++---------+-------------------------------------------------------------------------+-------------+--------------+--------------------+--------------------+------------------+-------------------+-----------------+-----------------+--------------+---------------+--------------+
+| 0 |
s3:/.../table/data/00000-3-8d6d60e8-d427-4809-bcf0-f5d45a4aad96.parquet |
PARQUET | 1 | 597 | [1 -> 90, 2 -> 62] | [1 -> 1,
2 -> 1] | [1 -> 0, 2 -> 0] | [1 -> , 2 -> c] | [1 -> , 2 -> c] | null
| [4] | null |
+| 0 |
s3:/.../table/data/00001-4-8d6d60e8-d427-4809-bcf0-f5d45a4aad96.parquet |
PARQUET | 1 | 597 | [1 -> 90, 2 -> 62] | [1 -> 1,
2 -> 1] | [1 -> 0, 2 -> 0] | [1 -> , 2 -> b] | [1 -> , 2 -> b] | null
| [4] | null |
+| 0 |
s3:/.../table/data/00002-5-8d6d60e8-d427-4809-bcf0-f5d45a4aad96.parquet |
PARQUET | 1 | 597 | [1 -> 90, 2 -> 62] | [1 -> 1,
2 -> 1] | [1 -> 0, 2 -> 0] | [1 -> , 2 -> a] | [1 -> , 2 -> a] | null
| [4] | null |
++-----------------------------------------------------------------------------------+-------------+--------------+--------------------+--------------------+------------------+-------------------+-----------------+-----------------+--------------+---------------+--------------+
+```
Review comment:
I think the `content` column will be useful to differentiate between
data and delete files. But I do agree that the int column is not useful. We
should probably display the FileContent name here?
##########
File path: site/docs/spark.md
##########
@@ -619,13 +619,30 @@ To show a table's data files and each file's metadata,
run:
SELECT * FROM prod.db.table.files
```
```text
-+-------------------------------------------------------------------------+-------------+--------------+--------------------+--------------------+------------------+-------------------+-----------------+-----------------+--------------+---------------+
-| file_path |
file_format | record_count | file_size_in_bytes | column_sizes |
value_counts | null_value_counts | lower_bounds | upper_bounds |
key_metadata | split_offsets |
-+-------------------------------------------------------------------------+-------------+--------------+--------------------+--------------------+------------------+-------------------+-----------------+-----------------+--------------+---------------+
-| s3:/.../table/data/00000-3-8d6d60e8-d427-4809-bcf0-f5d45a4aad96.parquet |
PARQUET | 1 | 597 | [1 -> 90, 2 -> 62] | [1 -> 1,
2 -> 1] | [1 -> 0, 2 -> 0] | [1 -> , 2 -> c] | [1 -> , 2 -> c] | null
| [4] |
-| s3:/.../table/data/00001-4-8d6d60e8-d427-4809-bcf0-f5d45a4aad96.parquet |
PARQUET | 1 | 597 | [1 -> 90, 2 -> 62] | [1 -> 1,
2 -> 1] | [1 -> 0, 2 -> 0] | [1 -> , 2 -> b] | [1 -> , 2 -> b] | null
| [4] |
-| s3:/.../table/data/00002-5-8d6d60e8-d427-4809-bcf0-f5d45a4aad96.parquet |
PARQUET | 1 | 597 | [1 -> 90, 2 -> 62] | [1 -> 1,
2 -> 1] | [1 -> 0, 2 -> 0] | [1 -> , 2 -> a] | [1 -> , 2 -> a] | null
| [4] |
-+-------------------------------------------------------------------------+-------------+--------------+--------------------+--------------------+------------------+-------------------+-----------------+-----------------+--------------+---------------+
++---------+-------------------------------------------------------------------------+-------------+--------------+--------------------+--------------------+------------------+-------------------+-----------------+-----------------+--------------+---------------+--------------+
+| content | file_path
| file_format | record_count | file_size_in_bytes | column_sizes |
value_counts | null_value_counts | lower_bounds | upper_bounds |
key_metadata | split_offsets | equality_ids |
++---------+-------------------------------------------------------------------------+-------------+--------------+--------------------+--------------------+------------------+-------------------+-----------------+-----------------+--------------+---------------+--------------+
+| 0 |
s3:/.../table/data/00000-3-8d6d60e8-d427-4809-bcf0-f5d45a4aad96.parquet |
PARQUET | 1 | 597 | [1 -> 90, 2 -> 62] | [1 -> 1,
2 -> 1] | [1 -> 0, 2 -> 0] | [1 -> , 2 -> c] | [1 -> , 2 -> c] | null
| [4] | null |
+| 0 |
s3:/.../table/data/00001-4-8d6d60e8-d427-4809-bcf0-f5d45a4aad96.parquet |
PARQUET | 1 | 597 | [1 -> 90, 2 -> 62] | [1 -> 1,
2 -> 1] | [1 -> 0, 2 -> 0] | [1 -> , 2 -> b] | [1 -> , 2 -> b] | null
| [4] | null |
+| 0 |
s3:/.../table/data/00002-5-8d6d60e8-d427-4809-bcf0-f5d45a4aad96.parquet |
PARQUET | 1 | 597 | [1 -> 90, 2 -> 62] | [1 -> 1,
2 -> 1] | [1 -> 0, 2 -> 0] | [1 -> , 2 -> a] | [1 -> , 2 -> a] | null
| [4] | null |
++-----------------------------------------------------------------------------------+-------------+--------------+--------------------+--------------------+------------------+-------------------+-----------------+-----------------+--------------+---------------+--------------+
+```
+
+### Entries
+
+To show a table's manifest entries, run:
+
+```sql
+SELECT * FROM prod.db.table.entries
+```
+```text
++--------+----------------+-----------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
+| status | snapshot_id | sequence_number | data_file
|
++--------+----------------+-----------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
+| 1 | 57897183625154 | 0 | [0,
s3:/.../table/data/00000-3-8d6d60e8-d427-4809-bcf0-f5d45a4aad96.parquet,
PARQUET, 1, 597, [1 -> 90, 2 -> 62], [1 -> 1, 2 -> 1], [1 -> 0, 2 -> 0], [1 ->
, 2 -> c], [1 -> , 2 -> c], null, [4], null] |
+| 1 | 57897183625154 | 0 | [0,
s3:/.../table/data/00001-4-8d6d60e8-d427-4809-bcf0-f5d45a4aad96.parquet,
PARQUET, 1, 597, [1 -> 90, 2 -> 62], [1 -> 1, 2 -> 1], [1 -> 0, 2 -> 0], [1 ->
, 2 -> b], [1 -> , 2 -> b], null, [4], null] |
+| 1 | 57897183625154 | 0 | [0,
s3:/.../table/data/00002-5-8d6d60e8-d427-4809-bcf0-f5d45a4aad96.parquet,
PARQUET, 1, 597, [1 -> 90, 2 -> 62], [1 -> 1, 2 -> 1], [1 -> 0, 2 -> 0], [1 ->
, 2 -> a], [1 -> , 2 -> a], null, [4], null] |
++--------+----------------+-----------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
Review comment:
I think that makes sense. As I mentioned in #1365, I wasn't sure whether
this was intentional, so I just went ahead and added it. I can remove this. I
have found the `entries` table to be really useful for looking at the set of
files that have come between two snapshots (or in a set of snapshots).
##########
File path: site/docs/spark.md
##########
@@ -619,13 +619,30 @@ To show a table's data files and each file's metadata,
run:
SELECT * FROM prod.db.table.files
```
```text
-+-------------------------------------------------------------------------+-------------+--------------+--------------------+--------------------+------------------+-------------------+-----------------+-----------------+--------------+---------------+
-| file_path |
file_format | record_count | file_size_in_bytes | column_sizes |
value_counts | null_value_counts | lower_bounds | upper_bounds |
key_metadata | split_offsets |
-+-------------------------------------------------------------------------+-------------+--------------+--------------------+--------------------+------------------+-------------------+-----------------+-----------------+--------------+---------------+
-| s3:/.../table/data/00000-3-8d6d60e8-d427-4809-bcf0-f5d45a4aad96.parquet |
PARQUET | 1 | 597 | [1 -> 90, 2 -> 62] | [1 -> 1,
2 -> 1] | [1 -> 0, 2 -> 0] | [1 -> , 2 -> c] | [1 -> , 2 -> c] | null
| [4] |
-| s3:/.../table/data/00001-4-8d6d60e8-d427-4809-bcf0-f5d45a4aad96.parquet |
PARQUET | 1 | 597 | [1 -> 90, 2 -> 62] | [1 -> 1,
2 -> 1] | [1 -> 0, 2 -> 0] | [1 -> , 2 -> b] | [1 -> , 2 -> b] | null
| [4] |
-| s3:/.../table/data/00002-5-8d6d60e8-d427-4809-bcf0-f5d45a4aad96.parquet |
PARQUET | 1 | 597 | [1 -> 90, 2 -> 62] | [1 -> 1,
2 -> 1] | [1 -> 0, 2 -> 0] | [1 -> , 2 -> a] | [1 -> , 2 -> a] | null
| [4] |
-+-------------------------------------------------------------------------+-------------+--------------+--------------------+--------------------+------------------+-------------------+-----------------+-----------------+--------------+---------------+
++---------+-------------------------------------------------------------------------+-------------+--------------+--------------------+--------------------+------------------+-------------------+-----------------+-----------------+--------------+---------------+--------------+
+| content | file_path
| file_format | record_count | file_size_in_bytes | column_sizes |
value_counts | null_value_counts | lower_bounds | upper_bounds |
key_metadata | split_offsets | equality_ids |
++---------+-------------------------------------------------------------------------+-------------+--------------+--------------------+--------------------+------------------+-------------------+-----------------+-----------------+--------------+---------------+--------------+
+| 0 |
s3:/.../table/data/00000-3-8d6d60e8-d427-4809-bcf0-f5d45a4aad96.parquet |
PARQUET | 1 | 597 | [1 -> 90, 2 -> 62] | [1 -> 1,
2 -> 1] | [1 -> 0, 2 -> 0] | [1 -> , 2 -> c] | [1 -> , 2 -> c] | null
| [4] | null |
+| 0 |
s3:/.../table/data/00001-4-8d6d60e8-d427-4809-bcf0-f5d45a4aad96.parquet |
PARQUET | 1 | 597 | [1 -> 90, 2 -> 62] | [1 -> 1,
2 -> 1] | [1 -> 0, 2 -> 0] | [1 -> , 2 -> b] | [1 -> , 2 -> b] | null
| [4] | null |
+| 0 |
s3:/.../table/data/00002-5-8d6d60e8-d427-4809-bcf0-f5d45a4aad96.parquet |
PARQUET | 1 | 597 | [1 -> 90, 2 -> 62] | [1 -> 1,
2 -> 1] | [1 -> 0, 2 -> 0] | [1 -> , 2 -> a] | [1 -> , 2 -> a] | null
| [4] | null |
++-----------------------------------------------------------------------------------+-------------+--------------+--------------------+--------------------+------------------+-------------------+-----------------+-----------------+--------------+---------------+--------------+
+```
Review comment:
Seems like the files table only returns data files and not delete files,
so content will always be `0`. We can remove this field. Same with
`equalityIds`, this will always be null since the files table only covers data
files.
##########
File path: site/docs/spark.md
##########
@@ -619,13 +619,30 @@ To show a table's data files and each file's metadata,
run:
SELECT * FROM prod.db.table.files
```
```text
-+-------------------------------------------------------------------------+-------------+--------------+--------------------+--------------------+------------------+-------------------+-----------------+-----------------+--------------+---------------+
-| file_path |
file_format | record_count | file_size_in_bytes | column_sizes |
value_counts | null_value_counts | lower_bounds | upper_bounds |
key_metadata | split_offsets |
-+-------------------------------------------------------------------------+-------------+--------------+--------------------+--------------------+------------------+-------------------+-----------------+-----------------+--------------+---------------+
-| s3:/.../table/data/00000-3-8d6d60e8-d427-4809-bcf0-f5d45a4aad96.parquet |
PARQUET | 1 | 597 | [1 -> 90, 2 -> 62] | [1 -> 1,
2 -> 1] | [1 -> 0, 2 -> 0] | [1 -> , 2 -> c] | [1 -> , 2 -> c] | null
| [4] |
-| s3:/.../table/data/00001-4-8d6d60e8-d427-4809-bcf0-f5d45a4aad96.parquet |
PARQUET | 1 | 597 | [1 -> 90, 2 -> 62] | [1 -> 1,
2 -> 1] | [1 -> 0, 2 -> 0] | [1 -> , 2 -> b] | [1 -> , 2 -> b] | null
| [4] |
-| s3:/.../table/data/00002-5-8d6d60e8-d427-4809-bcf0-f5d45a4aad96.parquet |
PARQUET | 1 | 597 | [1 -> 90, 2 -> 62] | [1 -> 1,
2 -> 1] | [1 -> 0, 2 -> 0] | [1 -> , 2 -> a] | [1 -> , 2 -> a] | null
| [4] |
-+-------------------------------------------------------------------------+-------------+--------------+--------------------+--------------------+------------------+-------------------+-----------------+-----------------+--------------+---------------+
++---------+-------------------------------------------------------------------------+-------------+--------------+--------------------+--------------------+------------------+-------------------+-----------------+-----------------+--------------+---------------+--------------+
+| content | file_path
| file_format | record_count | file_size_in_bytes | column_sizes |
value_counts | null_value_counts | lower_bounds | upper_bounds |
key_metadata | split_offsets | equality_ids |
++---------+-------------------------------------------------------------------------+-------------+--------------+--------------------+--------------------+------------------+-------------------+-----------------+-----------------+--------------+---------------+--------------+
+| 0 |
s3:/.../table/data/00000-3-8d6d60e8-d427-4809-bcf0-f5d45a4aad96.parquet |
PARQUET | 1 | 597 | [1 -> 90, 2 -> 62] | [1 -> 1,
2 -> 1] | [1 -> 0, 2 -> 0] | [1 -> , 2 -> c] | [1 -> , 2 -> c] | null
| [4] | null |
+| 0 |
s3:/.../table/data/00001-4-8d6d60e8-d427-4809-bcf0-f5d45a4aad96.parquet |
PARQUET | 1 | 597 | [1 -> 90, 2 -> 62] | [1 -> 1,
2 -> 1] | [1 -> 0, 2 -> 0] | [1 -> , 2 -> b] | [1 -> , 2 -> b] | null
| [4] | null |
+| 0 |
s3:/.../table/data/00002-5-8d6d60e8-d427-4809-bcf0-f5d45a4aad96.parquet |
PARQUET | 1 | 597 | [1 -> 90, 2 -> 62] | [1 -> 1,
2 -> 1] | [1 -> 0, 2 -> 0] | [1 -> , 2 -> a] | [1 -> , 2 -> a] | null
| [4] | null |
++-----------------------------------------------------------------------------------+-------------+--------------+--------------------+--------------------+------------------+-------------------+-----------------+-----------------+--------------+---------------+--------------+
+```
Review comment:
I think the `content` column will be useful to differentiate between
data and delete files. But I do agree that the int column is not useful. We
should probably display the FileContent name here?
##########
File path: site/docs/spark.md
##########
@@ -619,13 +619,30 @@ To show a table's data files and each file's metadata,
run:
SELECT * FROM prod.db.table.files
```
```text
-+-------------------------------------------------------------------------+-------------+--------------+--------------------+--------------------+------------------+-------------------+-----------------+-----------------+--------------+---------------+
-| file_path |
file_format | record_count | file_size_in_bytes | column_sizes |
value_counts | null_value_counts | lower_bounds | upper_bounds |
key_metadata | split_offsets |
-+-------------------------------------------------------------------------+-------------+--------------+--------------------+--------------------+------------------+-------------------+-----------------+-----------------+--------------+---------------+
-| s3:/.../table/data/00000-3-8d6d60e8-d427-4809-bcf0-f5d45a4aad96.parquet |
PARQUET | 1 | 597 | [1 -> 90, 2 -> 62] | [1 -> 1,
2 -> 1] | [1 -> 0, 2 -> 0] | [1 -> , 2 -> c] | [1 -> , 2 -> c] | null
| [4] |
-| s3:/.../table/data/00001-4-8d6d60e8-d427-4809-bcf0-f5d45a4aad96.parquet |
PARQUET | 1 | 597 | [1 -> 90, 2 -> 62] | [1 -> 1,
2 -> 1] | [1 -> 0, 2 -> 0] | [1 -> , 2 -> b] | [1 -> , 2 -> b] | null
| [4] |
-| s3:/.../table/data/00002-5-8d6d60e8-d427-4809-bcf0-f5d45a4aad96.parquet |
PARQUET | 1 | 597 | [1 -> 90, 2 -> 62] | [1 -> 1,
2 -> 1] | [1 -> 0, 2 -> 0] | [1 -> , 2 -> a] | [1 -> , 2 -> a] | null
| [4] |
-+-------------------------------------------------------------------------+-------------+--------------+--------------------+--------------------+------------------+-------------------+-----------------+-----------------+--------------+---------------+
++---------+-------------------------------------------------------------------------+-------------+--------------+--------------------+--------------------+------------------+-------------------+-----------------+-----------------+--------------+---------------+--------------+
+| content | file_path
| file_format | record_count | file_size_in_bytes | column_sizes |
value_counts | null_value_counts | lower_bounds | upper_bounds |
key_metadata | split_offsets | equality_ids |
++---------+-------------------------------------------------------------------------+-------------+--------------+--------------------+--------------------+------------------+-------------------+-----------------+-----------------+--------------+---------------+--------------+
+| 0 |
s3:/.../table/data/00000-3-8d6d60e8-d427-4809-bcf0-f5d45a4aad96.parquet |
PARQUET | 1 | 597 | [1 -> 90, 2 -> 62] | [1 -> 1,
2 -> 1] | [1 -> 0, 2 -> 0] | [1 -> , 2 -> c] | [1 -> , 2 -> c] | null
| [4] | null |
+| 0 |
s3:/.../table/data/00001-4-8d6d60e8-d427-4809-bcf0-f5d45a4aad96.parquet |
PARQUET | 1 | 597 | [1 -> 90, 2 -> 62] | [1 -> 1,
2 -> 1] | [1 -> 0, 2 -> 0] | [1 -> , 2 -> b] | [1 -> , 2 -> b] | null
| [4] | null |
+| 0 |
s3:/.../table/data/00002-5-8d6d60e8-d427-4809-bcf0-f5d45a4aad96.parquet |
PARQUET | 1 | 597 | [1 -> 90, 2 -> 62] | [1 -> 1,
2 -> 1] | [1 -> 0, 2 -> 0] | [1 -> , 2 -> a] | [1 -> , 2 -> a] | null
| [4] | null |
++-----------------------------------------------------------------------------------+-------------+--------------+--------------------+--------------------+------------------+-------------------+-----------------+-----------------+--------------+---------------+--------------+
+```
+
+### Entries
+
+To show a table's manifest entries, run:
+
+```sql
+SELECT * FROM prod.db.table.entries
+```
+```text
++--------+----------------+-----------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
+| status | snapshot_id | sequence_number | data_file
|
++--------+----------------+-----------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
+| 1 | 57897183625154 | 0 | [0,
s3:/.../table/data/00000-3-8d6d60e8-d427-4809-bcf0-f5d45a4aad96.parquet,
PARQUET, 1, 597, [1 -> 90, 2 -> 62], [1 -> 1, 2 -> 1], [1 -> 0, 2 -> 0], [1 ->
, 2 -> c], [1 -> , 2 -> c], null, [4], null] |
+| 1 | 57897183625154 | 0 | [0,
s3:/.../table/data/00001-4-8d6d60e8-d427-4809-bcf0-f5d45a4aad96.parquet,
PARQUET, 1, 597, [1 -> 90, 2 -> 62], [1 -> 1, 2 -> 1], [1 -> 0, 2 -> 0], [1 ->
, 2 -> b], [1 -> , 2 -> b], null, [4], null] |
+| 1 | 57897183625154 | 0 | [0,
s3:/.../table/data/00002-5-8d6d60e8-d427-4809-bcf0-f5d45a4aad96.parquet,
PARQUET, 1, 597, [1 -> 90, 2 -> 62], [1 -> 1, 2 -> 1], [1 -> 0, 2 -> 0], [1 ->
, 2 -> a], [1 -> , 2 -> a], null, [4], null] |
++--------+----------------+-----------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
Review comment:
I think that makes sense. As I mentioned in #1365, I wasn't sure whether
this was intentional, so I just went ahead and added it. I can remove this. I
have found the `entries` table to be really useful for looking at the set of
files that have come between two snapshots (or in a set of snapshots).
##########
File path: site/docs/spark.md
##########
@@ -619,13 +619,30 @@ To show a table's data files and each file's metadata,
run:
SELECT * FROM prod.db.table.files
```
```text
-+-------------------------------------------------------------------------+-------------+--------------+--------------------+--------------------+------------------+-------------------+-----------------+-----------------+--------------+---------------+
-| file_path |
file_format | record_count | file_size_in_bytes | column_sizes |
value_counts | null_value_counts | lower_bounds | upper_bounds |
key_metadata | split_offsets |
-+-------------------------------------------------------------------------+-------------+--------------+--------------------+--------------------+------------------+-------------------+-----------------+-----------------+--------------+---------------+
-| s3:/.../table/data/00000-3-8d6d60e8-d427-4809-bcf0-f5d45a4aad96.parquet |
PARQUET | 1 | 597 | [1 -> 90, 2 -> 62] | [1 -> 1,
2 -> 1] | [1 -> 0, 2 -> 0] | [1 -> , 2 -> c] | [1 -> , 2 -> c] | null
| [4] |
-| s3:/.../table/data/00001-4-8d6d60e8-d427-4809-bcf0-f5d45a4aad96.parquet |
PARQUET | 1 | 597 | [1 -> 90, 2 -> 62] | [1 -> 1,
2 -> 1] | [1 -> 0, 2 -> 0] | [1 -> , 2 -> b] | [1 -> , 2 -> b] | null
| [4] |
-| s3:/.../table/data/00002-5-8d6d60e8-d427-4809-bcf0-f5d45a4aad96.parquet |
PARQUET | 1 | 597 | [1 -> 90, 2 -> 62] | [1 -> 1,
2 -> 1] | [1 -> 0, 2 -> 0] | [1 -> , 2 -> a] | [1 -> , 2 -> a] | null
| [4] |
-+-------------------------------------------------------------------------+-------------+--------------+--------------------+--------------------+------------------+-------------------+-----------------+-----------------+--------------+---------------+
++---------+-------------------------------------------------------------------------+-------------+--------------+--------------------+--------------------+------------------+-------------------+-----------------+-----------------+--------------+---------------+--------------+
+| content | file_path
| file_format | record_count | file_size_in_bytes | column_sizes |
value_counts | null_value_counts | lower_bounds | upper_bounds |
key_metadata | split_offsets | equality_ids |
++---------+-------------------------------------------------------------------------+-------------+--------------+--------------------+--------------------+------------------+-------------------+-----------------+-----------------+--------------+---------------+--------------+
+| 0 |
s3:/.../table/data/00000-3-8d6d60e8-d427-4809-bcf0-f5d45a4aad96.parquet |
PARQUET | 1 | 597 | [1 -> 90, 2 -> 62] | [1 -> 1,
2 -> 1] | [1 -> 0, 2 -> 0] | [1 -> , 2 -> c] | [1 -> , 2 -> c] | null
| [4] | null |
+| 0 |
s3:/.../table/data/00001-4-8d6d60e8-d427-4809-bcf0-f5d45a4aad96.parquet |
PARQUET | 1 | 597 | [1 -> 90, 2 -> 62] | [1 -> 1,
2 -> 1] | [1 -> 0, 2 -> 0] | [1 -> , 2 -> b] | [1 -> , 2 -> b] | null
| [4] | null |
+| 0 |
s3:/.../table/data/00002-5-8d6d60e8-d427-4809-bcf0-f5d45a4aad96.parquet |
PARQUET | 1 | 597 | [1 -> 90, 2 -> 62] | [1 -> 1,
2 -> 1] | [1 -> 0, 2 -> 0] | [1 -> , 2 -> a] | [1 -> , 2 -> a] | null
| [4] | null |
++-----------------------------------------------------------------------------------+-------------+--------------+--------------------+--------------------+------------------+-------------------+-----------------+-----------------+--------------+---------------+--------------+
+```
Review comment:
Seems like the files table only returns data files and not delete files,
so content will always be `0`. We can remove this field. Same with
`equalityIds`, this will always be null since the files table only covers data
files.
##########
File path: site/docs/spark.md
##########
@@ -619,13 +619,30 @@ To show a table's data files and each file's metadata,
run:
SELECT * FROM prod.db.table.files
```
```text
-+-------------------------------------------------------------------------+-------------+--------------+--------------------+--------------------+------------------+-------------------+-----------------+-----------------+--------------+---------------+
-| file_path |
file_format | record_count | file_size_in_bytes | column_sizes |
value_counts | null_value_counts | lower_bounds | upper_bounds |
key_metadata | split_offsets |
-+-------------------------------------------------------------------------+-------------+--------------+--------------------+--------------------+------------------+-------------------+-----------------+-----------------+--------------+---------------+
-| s3:/.../table/data/00000-3-8d6d60e8-d427-4809-bcf0-f5d45a4aad96.parquet |
PARQUET | 1 | 597 | [1 -> 90, 2 -> 62] | [1 -> 1,
2 -> 1] | [1 -> 0, 2 -> 0] | [1 -> , 2 -> c] | [1 -> , 2 -> c] | null
| [4] |
-| s3:/.../table/data/00001-4-8d6d60e8-d427-4809-bcf0-f5d45a4aad96.parquet |
PARQUET | 1 | 597 | [1 -> 90, 2 -> 62] | [1 -> 1,
2 -> 1] | [1 -> 0, 2 -> 0] | [1 -> , 2 -> b] | [1 -> , 2 -> b] | null
| [4] |
-| s3:/.../table/data/00002-5-8d6d60e8-d427-4809-bcf0-f5d45a4aad96.parquet |
PARQUET | 1 | 597 | [1 -> 90, 2 -> 62] | [1 -> 1,
2 -> 1] | [1 -> 0, 2 -> 0] | [1 -> , 2 -> a] | [1 -> , 2 -> a] | null
| [4] |
-+-------------------------------------------------------------------------+-------------+--------------+--------------------+--------------------+------------------+-------------------+-----------------+-----------------+--------------+---------------+
++---------+-------------------------------------------------------------------------+-------------+--------------+--------------------+--------------------+------------------+-------------------+-----------------+-----------------+--------------+---------------+--------------+
+| content | file_path
| file_format | record_count | file_size_in_bytes | column_sizes |
value_counts | null_value_counts | lower_bounds | upper_bounds |
key_metadata | split_offsets | equality_ids |
++---------+-------------------------------------------------------------------------+-------------+--------------+--------------------+--------------------+------------------+-------------------+-----------------+-----------------+--------------+---------------+--------------+
+| 0 |
s3:/.../table/data/00000-3-8d6d60e8-d427-4809-bcf0-f5d45a4aad96.parquet |
PARQUET | 1 | 597 | [1 -> 90, 2 -> 62] | [1 -> 1,
2 -> 1] | [1 -> 0, 2 -> 0] | [1 -> , 2 -> c] | [1 -> , 2 -> c] | null
| [4] | null |
+| 0 |
s3:/.../table/data/00001-4-8d6d60e8-d427-4809-bcf0-f5d45a4aad96.parquet |
PARQUET | 1 | 597 | [1 -> 90, 2 -> 62] | [1 -> 1,
2 -> 1] | [1 -> 0, 2 -> 0] | [1 -> , 2 -> b] | [1 -> , 2 -> b] | null
| [4] | null |
+| 0 |
s3:/.../table/data/00002-5-8d6d60e8-d427-4809-bcf0-f5d45a4aad96.parquet |
PARQUET | 1 | 597 | [1 -> 90, 2 -> 62] | [1 -> 1,
2 -> 1] | [1 -> 0, 2 -> 0] | [1 -> , 2 -> a] | [1 -> , 2 -> a] | null
| [4] | null |
++-----------------------------------------------------------------------------------+-------------+--------------+--------------------+--------------------+------------------+-------------------+-----------------+-----------------+--------------+---------------+--------------+
+```
Review comment:
I think the `content` column will be useful to differentiate between
data and delete files. But I do agree that the int column is not useful. We
should probably display the FileContent name here?
##########
File path: site/docs/spark.md
##########
@@ -619,13 +619,30 @@ To show a table's data files and each file's metadata,
run:
SELECT * FROM prod.db.table.files
```
```text
-+-------------------------------------------------------------------------+-------------+--------------+--------------------+--------------------+------------------+-------------------+-----------------+-----------------+--------------+---------------+
-| file_path |
file_format | record_count | file_size_in_bytes | column_sizes |
value_counts | null_value_counts | lower_bounds | upper_bounds |
key_metadata | split_offsets |
-+-------------------------------------------------------------------------+-------------+--------------+--------------------+--------------------+------------------+-------------------+-----------------+-----------------+--------------+---------------+
-| s3:/.../table/data/00000-3-8d6d60e8-d427-4809-bcf0-f5d45a4aad96.parquet |
PARQUET | 1 | 597 | [1 -> 90, 2 -> 62] | [1 -> 1,
2 -> 1] | [1 -> 0, 2 -> 0] | [1 -> , 2 -> c] | [1 -> , 2 -> c] | null
| [4] |
-| s3:/.../table/data/00001-4-8d6d60e8-d427-4809-bcf0-f5d45a4aad96.parquet |
PARQUET | 1 | 597 | [1 -> 90, 2 -> 62] | [1 -> 1,
2 -> 1] | [1 -> 0, 2 -> 0] | [1 -> , 2 -> b] | [1 -> , 2 -> b] | null
| [4] |
-| s3:/.../table/data/00002-5-8d6d60e8-d427-4809-bcf0-f5d45a4aad96.parquet |
PARQUET | 1 | 597 | [1 -> 90, 2 -> 62] | [1 -> 1,
2 -> 1] | [1 -> 0, 2 -> 0] | [1 -> , 2 -> a] | [1 -> , 2 -> a] | null
| [4] |
-+-------------------------------------------------------------------------+-------------+--------------+--------------------+--------------------+------------------+-------------------+-----------------+-----------------+--------------+---------------+
++---------+-------------------------------------------------------------------------+-------------+--------------+--------------------+--------------------+------------------+-------------------+-----------------+-----------------+--------------+---------------+--------------+
+| content | file_path
| file_format | record_count | file_size_in_bytes | column_sizes |
value_counts | null_value_counts | lower_bounds | upper_bounds |
key_metadata | split_offsets | equality_ids |
++---------+-------------------------------------------------------------------------+-------------+--------------+--------------------+--------------------+------------------+-------------------+-----------------+-----------------+--------------+---------------+--------------+
+| 0 |
s3:/.../table/data/00000-3-8d6d60e8-d427-4809-bcf0-f5d45a4aad96.parquet |
PARQUET | 1 | 597 | [1 -> 90, 2 -> 62] | [1 -> 1,
2 -> 1] | [1 -> 0, 2 -> 0] | [1 -> , 2 -> c] | [1 -> , 2 -> c] | null
| [4] | null |
+| 0 |
s3:/.../table/data/00001-4-8d6d60e8-d427-4809-bcf0-f5d45a4aad96.parquet |
PARQUET | 1 | 597 | [1 -> 90, 2 -> 62] | [1 -> 1,
2 -> 1] | [1 -> 0, 2 -> 0] | [1 -> , 2 -> b] | [1 -> , 2 -> b] | null
| [4] | null |
+| 0 |
s3:/.../table/data/00002-5-8d6d60e8-d427-4809-bcf0-f5d45a4aad96.parquet |
PARQUET | 1 | 597 | [1 -> 90, 2 -> 62] | [1 -> 1,
2 -> 1] | [1 -> 0, 2 -> 0] | [1 -> , 2 -> a] | [1 -> , 2 -> a] | null
| [4] | null |
++-----------------------------------------------------------------------------------+-------------+--------------+--------------------+--------------------+------------------+-------------------+-----------------+-----------------+--------------+---------------+--------------+
+```
+
+### Entries
+
+To show a table's manifest entries, run:
+
+```sql
+SELECT * FROM prod.db.table.entries
+```
+```text
++--------+----------------+-----------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
+| status | snapshot_id | sequence_number | data_file
|
++--------+----------------+-----------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
+| 1 | 57897183625154 | 0 | [0,
s3:/.../table/data/00000-3-8d6d60e8-d427-4809-bcf0-f5d45a4aad96.parquet,
PARQUET, 1, 597, [1 -> 90, 2 -> 62], [1 -> 1, 2 -> 1], [1 -> 0, 2 -> 0], [1 ->
, 2 -> c], [1 -> , 2 -> c], null, [4], null] |
+| 1 | 57897183625154 | 0 | [0,
s3:/.../table/data/00001-4-8d6d60e8-d427-4809-bcf0-f5d45a4aad96.parquet,
PARQUET, 1, 597, [1 -> 90, 2 -> 62], [1 -> 1, 2 -> 1], [1 -> 0, 2 -> 0], [1 ->
, 2 -> b], [1 -> , 2 -> b], null, [4], null] |
+| 1 | 57897183625154 | 0 | [0,
s3:/.../table/data/00002-5-8d6d60e8-d427-4809-bcf0-f5d45a4aad96.parquet,
PARQUET, 1, 597, [1 -> 90, 2 -> 62], [1 -> 1, 2 -> 1], [1 -> 0, 2 -> 0], [1 ->
, 2 -> a], [1 -> , 2 -> a], null, [4], null] |
++--------+----------------+-----------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
Review comment:
I think that makes sense. As I mentioned in #1365, I wasn't sure whether
this was intentional, so I just went ahead and added it. I can remove this. I
have found the `entries` table to be really useful for looking at the set of
files that have come between two snapshots (or in a set of snapshots).
##########
File path: site/docs/spark.md
##########
@@ -619,13 +619,30 @@ To show a table's data files and each file's metadata,
run:
SELECT * FROM prod.db.table.files
```
```text
-+-------------------------------------------------------------------------+-------------+--------------+--------------------+--------------------+------------------+-------------------+-----------------+-----------------+--------------+---------------+
-| file_path |
file_format | record_count | file_size_in_bytes | column_sizes |
value_counts | null_value_counts | lower_bounds | upper_bounds |
key_metadata | split_offsets |
-+-------------------------------------------------------------------------+-------------+--------------+--------------------+--------------------+------------------+-------------------+-----------------+-----------------+--------------+---------------+
-| s3:/.../table/data/00000-3-8d6d60e8-d427-4809-bcf0-f5d45a4aad96.parquet |
PARQUET | 1 | 597 | [1 -> 90, 2 -> 62] | [1 -> 1,
2 -> 1] | [1 -> 0, 2 -> 0] | [1 -> , 2 -> c] | [1 -> , 2 -> c] | null
| [4] |
-| s3:/.../table/data/00001-4-8d6d60e8-d427-4809-bcf0-f5d45a4aad96.parquet |
PARQUET | 1 | 597 | [1 -> 90, 2 -> 62] | [1 -> 1,
2 -> 1] | [1 -> 0, 2 -> 0] | [1 -> , 2 -> b] | [1 -> , 2 -> b] | null
| [4] |
-| s3:/.../table/data/00002-5-8d6d60e8-d427-4809-bcf0-f5d45a4aad96.parquet |
PARQUET | 1 | 597 | [1 -> 90, 2 -> 62] | [1 -> 1,
2 -> 1] | [1 -> 0, 2 -> 0] | [1 -> , 2 -> a] | [1 -> , 2 -> a] | null
| [4] |
-+-------------------------------------------------------------------------+-------------+--------------+--------------------+--------------------+------------------+-------------------+-----------------+-----------------+--------------+---------------+
++---------+-------------------------------------------------------------------------+-------------+--------------+--------------------+--------------------+------------------+-------------------+-----------------+-----------------+--------------+---------------+--------------+
+| content | file_path
| file_format | record_count | file_size_in_bytes | column_sizes |
value_counts | null_value_counts | lower_bounds | upper_bounds |
key_metadata | split_offsets | equality_ids |
++---------+-------------------------------------------------------------------------+-------------+--------------+--------------------+--------------------+------------------+-------------------+-----------------+-----------------+--------------+---------------+--------------+
+| 0 |
s3:/.../table/data/00000-3-8d6d60e8-d427-4809-bcf0-f5d45a4aad96.parquet |
PARQUET | 1 | 597 | [1 -> 90, 2 -> 62] | [1 -> 1,
2 -> 1] | [1 -> 0, 2 -> 0] | [1 -> , 2 -> c] | [1 -> , 2 -> c] | null
| [4] | null |
+| 0 |
s3:/.../table/data/00001-4-8d6d60e8-d427-4809-bcf0-f5d45a4aad96.parquet |
PARQUET | 1 | 597 | [1 -> 90, 2 -> 62] | [1 -> 1,
2 -> 1] | [1 -> 0, 2 -> 0] | [1 -> , 2 -> b] | [1 -> , 2 -> b] | null
| [4] | null |
+| 0 |
s3:/.../table/data/00002-5-8d6d60e8-d427-4809-bcf0-f5d45a4aad96.parquet |
PARQUET | 1 | 597 | [1 -> 90, 2 -> 62] | [1 -> 1,
2 -> 1] | [1 -> 0, 2 -> 0] | [1 -> , 2 -> a] | [1 -> , 2 -> a] | null
| [4] | null |
++-----------------------------------------------------------------------------------+-------------+--------------+--------------------+--------------------+------------------+-------------------+-----------------+-----------------+--------------+---------------+--------------+
+```
Review comment:
Seems like the files table only returns data files and not delete files,
so content will always be `0`. We can remove this field. Same with
`equalityIds`, this will always be null since the files table only covers data
files.
##########
File path: site/docs/spark.md
##########
@@ -619,13 +619,30 @@ To show a table's data files and each file's metadata,
run:
SELECT * FROM prod.db.table.files
```
```text
-+-------------------------------------------------------------------------+-------------+--------------+--------------------+--------------------+------------------+-------------------+-----------------+-----------------+--------------+---------------+
-| file_path |
file_format | record_count | file_size_in_bytes | column_sizes |
value_counts | null_value_counts | lower_bounds | upper_bounds |
key_metadata | split_offsets |
-+-------------------------------------------------------------------------+-------------+--------------+--------------------+--------------------+------------------+-------------------+-----------------+-----------------+--------------+---------------+
-| s3:/.../table/data/00000-3-8d6d60e8-d427-4809-bcf0-f5d45a4aad96.parquet |
PARQUET | 1 | 597 | [1 -> 90, 2 -> 62] | [1 -> 1,
2 -> 1] | [1 -> 0, 2 -> 0] | [1 -> , 2 -> c] | [1 -> , 2 -> c] | null
| [4] |
-| s3:/.../table/data/00001-4-8d6d60e8-d427-4809-bcf0-f5d45a4aad96.parquet |
PARQUET | 1 | 597 | [1 -> 90, 2 -> 62] | [1 -> 1,
2 -> 1] | [1 -> 0, 2 -> 0] | [1 -> , 2 -> b] | [1 -> , 2 -> b] | null
| [4] |
-| s3:/.../table/data/00002-5-8d6d60e8-d427-4809-bcf0-f5d45a4aad96.parquet |
PARQUET | 1 | 597 | [1 -> 90, 2 -> 62] | [1 -> 1,
2 -> 1] | [1 -> 0, 2 -> 0] | [1 -> , 2 -> a] | [1 -> , 2 -> a] | null
| [4] |
-+-------------------------------------------------------------------------+-------------+--------------+--------------------+--------------------+------------------+-------------------+-----------------+-----------------+--------------+---------------+
++---------+-------------------------------------------------------------------------+-------------+--------------+--------------------+--------------------+------------------+-------------------+-----------------+-----------------+--------------+---------------+--------------+
+| content | file_path
| file_format | record_count | file_size_in_bytes | column_sizes |
value_counts | null_value_counts | lower_bounds | upper_bounds |
key_metadata | split_offsets | equality_ids |
++---------+-------------------------------------------------------------------------+-------------+--------------+--------------------+--------------------+------------------+-------------------+-----------------+-----------------+--------------+---------------+--------------+
+| 0 |
s3:/.../table/data/00000-3-8d6d60e8-d427-4809-bcf0-f5d45a4aad96.parquet |
PARQUET | 1 | 597 | [1 -> 90, 2 -> 62] | [1 -> 1,
2 -> 1] | [1 -> 0, 2 -> 0] | [1 -> , 2 -> c] | [1 -> , 2 -> c] | null
| [4] | null |
+| 0 |
s3:/.../table/data/00001-4-8d6d60e8-d427-4809-bcf0-f5d45a4aad96.parquet |
PARQUET | 1 | 597 | [1 -> 90, 2 -> 62] | [1 -> 1,
2 -> 1] | [1 -> 0, 2 -> 0] | [1 -> , 2 -> b] | [1 -> , 2 -> b] | null
| [4] | null |
+| 0 |
s3:/.../table/data/00002-5-8d6d60e8-d427-4809-bcf0-f5d45a4aad96.parquet |
PARQUET | 1 | 597 | [1 -> 90, 2 -> 62] | [1 -> 1,
2 -> 1] | [1 -> 0, 2 -> 0] | [1 -> , 2 -> a] | [1 -> , 2 -> a] | null
| [4] | null |
++-----------------------------------------------------------------------------------+-------------+--------------+--------------------+--------------------+------------------+-------------------+-----------------+-----------------+--------------+---------------+--------------+
+```
Review comment:
I think the `content` column will be useful to differentiate between
data and delete files. But I do agree that the int column is not useful. We
should probably display the FileContent name here?
##########
File path: site/docs/spark.md
##########
@@ -619,13 +619,30 @@ To show a table's data files and each file's metadata,
run:
SELECT * FROM prod.db.table.files
```
```text
-+-------------------------------------------------------------------------+-------------+--------------+--------------------+--------------------+------------------+-------------------+-----------------+-----------------+--------------+---------------+
-| file_path |
file_format | record_count | file_size_in_bytes | column_sizes |
value_counts | null_value_counts | lower_bounds | upper_bounds |
key_metadata | split_offsets |
-+-------------------------------------------------------------------------+-------------+--------------+--------------------+--------------------+------------------+-------------------+-----------------+-----------------+--------------+---------------+
-| s3:/.../table/data/00000-3-8d6d60e8-d427-4809-bcf0-f5d45a4aad96.parquet |
PARQUET | 1 | 597 | [1 -> 90, 2 -> 62] | [1 -> 1,
2 -> 1] | [1 -> 0, 2 -> 0] | [1 -> , 2 -> c] | [1 -> , 2 -> c] | null
| [4] |
-| s3:/.../table/data/00001-4-8d6d60e8-d427-4809-bcf0-f5d45a4aad96.parquet |
PARQUET | 1 | 597 | [1 -> 90, 2 -> 62] | [1 -> 1,
2 -> 1] | [1 -> 0, 2 -> 0] | [1 -> , 2 -> b] | [1 -> , 2 -> b] | null
| [4] |
-| s3:/.../table/data/00002-5-8d6d60e8-d427-4809-bcf0-f5d45a4aad96.parquet |
PARQUET | 1 | 597 | [1 -> 90, 2 -> 62] | [1 -> 1,
2 -> 1] | [1 -> 0, 2 -> 0] | [1 -> , 2 -> a] | [1 -> , 2 -> a] | null
| [4] |
-+-------------------------------------------------------------------------+-------------+--------------+--------------------+--------------------+------------------+-------------------+-----------------+-----------------+--------------+---------------+
++---------+-------------------------------------------------------------------------+-------------+--------------+--------------------+--------------------+------------------+-------------------+-----------------+-----------------+--------------+---------------+--------------+
+| content | file_path
| file_format | record_count | file_size_in_bytes | column_sizes |
value_counts | null_value_counts | lower_bounds | upper_bounds |
key_metadata | split_offsets | equality_ids |
++---------+-------------------------------------------------------------------------+-------------+--------------+--------------------+--------------------+------------------+-------------------+-----------------+-----------------+--------------+---------------+--------------+
+| 0 |
s3:/.../table/data/00000-3-8d6d60e8-d427-4809-bcf0-f5d45a4aad96.parquet |
PARQUET | 1 | 597 | [1 -> 90, 2 -> 62] | [1 -> 1,
2 -> 1] | [1 -> 0, 2 -> 0] | [1 -> , 2 -> c] | [1 -> , 2 -> c] | null
| [4] | null |
+| 0 |
s3:/.../table/data/00001-4-8d6d60e8-d427-4809-bcf0-f5d45a4aad96.parquet |
PARQUET | 1 | 597 | [1 -> 90, 2 -> 62] | [1 -> 1,
2 -> 1] | [1 -> 0, 2 -> 0] | [1 -> , 2 -> b] | [1 -> , 2 -> b] | null
| [4] | null |
+| 0 |
s3:/.../table/data/00002-5-8d6d60e8-d427-4809-bcf0-f5d45a4aad96.parquet |
PARQUET | 1 | 597 | [1 -> 90, 2 -> 62] | [1 -> 1,
2 -> 1] | [1 -> 0, 2 -> 0] | [1 -> , 2 -> a] | [1 -> , 2 -> a] | null
| [4] | null |
++-----------------------------------------------------------------------------------+-------------+--------------+--------------------+--------------------+------------------+-------------------+-----------------+-----------------+--------------+---------------+--------------+
+```
+
+### Entries
+
+To show a table's manifest entries, run:
+
+```sql
+SELECT * FROM prod.db.table.entries
+```
+```text
++--------+----------------+-----------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
+| status | snapshot_id | sequence_number | data_file
|
++--------+----------------+-----------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
+| 1 | 57897183625154 | 0 | [0,
s3:/.../table/data/00000-3-8d6d60e8-d427-4809-bcf0-f5d45a4aad96.parquet,
PARQUET, 1, 597, [1 -> 90, 2 -> 62], [1 -> 1, 2 -> 1], [1 -> 0, 2 -> 0], [1 ->
, 2 -> c], [1 -> , 2 -> c], null, [4], null] |
+| 1 | 57897183625154 | 0 | [0,
s3:/.../table/data/00001-4-8d6d60e8-d427-4809-bcf0-f5d45a4aad96.parquet,
PARQUET, 1, 597, [1 -> 90, 2 -> 62], [1 -> 1, 2 -> 1], [1 -> 0, 2 -> 0], [1 ->
, 2 -> b], [1 -> , 2 -> b], null, [4], null] |
+| 1 | 57897183625154 | 0 | [0,
s3:/.../table/data/00002-5-8d6d60e8-d427-4809-bcf0-f5d45a4aad96.parquet,
PARQUET, 1, 597, [1 -> 90, 2 -> 62], [1 -> 1, 2 -> 1], [1 -> 0, 2 -> 0], [1 ->
, 2 -> a], [1 -> , 2 -> a], null, [4], null] |
++--------+----------------+-----------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
Review comment:
I think that makes sense. As I mentioned in #1365, I wasn't sure whether
this was intentional, so I just went ahead and added it. I can remove this. I
have found the `entries` table to be really useful for looking at the set of
files that have come between two snapshots (or in a set of snapshots).
##########
File path: site/docs/spark.md
##########
@@ -619,13 +619,30 @@ To show a table's data files and each file's metadata,
run:
SELECT * FROM prod.db.table.files
```
```text
-+-------------------------------------------------------------------------+-------------+--------------+--------------------+--------------------+------------------+-------------------+-----------------+-----------------+--------------+---------------+
-| file_path |
file_format | record_count | file_size_in_bytes | column_sizes |
value_counts | null_value_counts | lower_bounds | upper_bounds |
key_metadata | split_offsets |
-+-------------------------------------------------------------------------+-------------+--------------+--------------------+--------------------+------------------+-------------------+-----------------+-----------------+--------------+---------------+
-| s3:/.../table/data/00000-3-8d6d60e8-d427-4809-bcf0-f5d45a4aad96.parquet |
PARQUET | 1 | 597 | [1 -> 90, 2 -> 62] | [1 -> 1,
2 -> 1] | [1 -> 0, 2 -> 0] | [1 -> , 2 -> c] | [1 -> , 2 -> c] | null
| [4] |
-| s3:/.../table/data/00001-4-8d6d60e8-d427-4809-bcf0-f5d45a4aad96.parquet |
PARQUET | 1 | 597 | [1 -> 90, 2 -> 62] | [1 -> 1,
2 -> 1] | [1 -> 0, 2 -> 0] | [1 -> , 2 -> b] | [1 -> , 2 -> b] | null
| [4] |
-| s3:/.../table/data/00002-5-8d6d60e8-d427-4809-bcf0-f5d45a4aad96.parquet |
PARQUET | 1 | 597 | [1 -> 90, 2 -> 62] | [1 -> 1,
2 -> 1] | [1 -> 0, 2 -> 0] | [1 -> , 2 -> a] | [1 -> , 2 -> a] | null
| [4] |
-+-------------------------------------------------------------------------+-------------+--------------+--------------------+--------------------+------------------+-------------------+-----------------+-----------------+--------------+---------------+
++---------+-------------------------------------------------------------------------+-------------+--------------+--------------------+--------------------+------------------+-------------------+-----------------+-----------------+--------------+---------------+--------------+
+| content | file_path
| file_format | record_count | file_size_in_bytes | column_sizes |
value_counts | null_value_counts | lower_bounds | upper_bounds |
key_metadata | split_offsets | equality_ids |
++---------+-------------------------------------------------------------------------+-------------+--------------+--------------------+--------------------+------------------+-------------------+-----------------+-----------------+--------------+---------------+--------------+
+| 0 |
s3:/.../table/data/00000-3-8d6d60e8-d427-4809-bcf0-f5d45a4aad96.parquet |
PARQUET | 1 | 597 | [1 -> 90, 2 -> 62] | [1 -> 1,
2 -> 1] | [1 -> 0, 2 -> 0] | [1 -> , 2 -> c] | [1 -> , 2 -> c] | null
| [4] | null |
+| 0 |
s3:/.../table/data/00001-4-8d6d60e8-d427-4809-bcf0-f5d45a4aad96.parquet |
PARQUET | 1 | 597 | [1 -> 90, 2 -> 62] | [1 -> 1,
2 -> 1] | [1 -> 0, 2 -> 0] | [1 -> , 2 -> b] | [1 -> , 2 -> b] | null
| [4] | null |
+| 0 |
s3:/.../table/data/00002-5-8d6d60e8-d427-4809-bcf0-f5d45a4aad96.parquet |
PARQUET | 1 | 597 | [1 -> 90, 2 -> 62] | [1 -> 1,
2 -> 1] | [1 -> 0, 2 -> 0] | [1 -> , 2 -> a] | [1 -> , 2 -> a] | null
| [4] | null |
++-----------------------------------------------------------------------------------+-------------+--------------+--------------------+--------------------+------------------+-------------------+-----------------+-----------------+--------------+---------------+--------------+
+```
Review comment:
Seems like the files table only returns data files and not delete files,
so content will always be `0`. We can remove this field. Same with
`equalityIds`, this will always be null since the files table only covers data
files.
##########
File path: site/docs/spark.md
##########
@@ -619,13 +619,30 @@ To show a table's data files and each file's metadata,
run:
SELECT * FROM prod.db.table.files
```
```text
-+-------------------------------------------------------------------------+-------------+--------------+--------------------+--------------------+------------------+-------------------+-----------------+-----------------+--------------+---------------+
-| file_path |
file_format | record_count | file_size_in_bytes | column_sizes |
value_counts | null_value_counts | lower_bounds | upper_bounds |
key_metadata | split_offsets |
-+-------------------------------------------------------------------------+-------------+--------------+--------------------+--------------------+------------------+-------------------+-----------------+-----------------+--------------+---------------+
-| s3:/.../table/data/00000-3-8d6d60e8-d427-4809-bcf0-f5d45a4aad96.parquet |
PARQUET | 1 | 597 | [1 -> 90, 2 -> 62] | [1 -> 1,
2 -> 1] | [1 -> 0, 2 -> 0] | [1 -> , 2 -> c] | [1 -> , 2 -> c] | null
| [4] |
-| s3:/.../table/data/00001-4-8d6d60e8-d427-4809-bcf0-f5d45a4aad96.parquet |
PARQUET | 1 | 597 | [1 -> 90, 2 -> 62] | [1 -> 1,
2 -> 1] | [1 -> 0, 2 -> 0] | [1 -> , 2 -> b] | [1 -> , 2 -> b] | null
| [4] |
-| s3:/.../table/data/00002-5-8d6d60e8-d427-4809-bcf0-f5d45a4aad96.parquet |
PARQUET | 1 | 597 | [1 -> 90, 2 -> 62] | [1 -> 1,
2 -> 1] | [1 -> 0, 2 -> 0] | [1 -> , 2 -> a] | [1 -> , 2 -> a] | null
| [4] |
-+-------------------------------------------------------------------------+-------------+--------------+--------------------+--------------------+------------------+-------------------+-----------------+-----------------+--------------+---------------+
++---------+-------------------------------------------------------------------------+-------------+--------------+--------------------+--------------------+------------------+-------------------+-----------------+-----------------+--------------+---------------+--------------+
+| content | file_path
| file_format | record_count | file_size_in_bytes | column_sizes |
value_counts | null_value_counts | lower_bounds | upper_bounds |
key_metadata | split_offsets | equality_ids |
++---------+-------------------------------------------------------------------------+-------------+--------------+--------------------+--------------------+------------------+-------------------+-----------------+-----------------+--------------+---------------+--------------+
+| 0 |
s3:/.../table/data/00000-3-8d6d60e8-d427-4809-bcf0-f5d45a4aad96.parquet |
PARQUET | 1 | 597 | [1 -> 90, 2 -> 62] | [1 -> 1,
2 -> 1] | [1 -> 0, 2 -> 0] | [1 -> , 2 -> c] | [1 -> , 2 -> c] | null
| [4] | null |
+| 0 |
s3:/.../table/data/00001-4-8d6d60e8-d427-4809-bcf0-f5d45a4aad96.parquet |
PARQUET | 1 | 597 | [1 -> 90, 2 -> 62] | [1 -> 1,
2 -> 1] | [1 -> 0, 2 -> 0] | [1 -> , 2 -> b] | [1 -> , 2 -> b] | null
| [4] | null |
+| 0 |
s3:/.../table/data/00002-5-8d6d60e8-d427-4809-bcf0-f5d45a4aad96.parquet |
PARQUET | 1 | 597 | [1 -> 90, 2 -> 62] | [1 -> 1,
2 -> 1] | [1 -> 0, 2 -> 0] | [1 -> , 2 -> a] | [1 -> , 2 -> a] | null
| [4] | null |
++-----------------------------------------------------------------------------------+-------------+--------------+--------------------+--------------------+------------------+-------------------+-----------------+-----------------+--------------+---------------+--------------+
+```
Review comment:
I think the `content` column will be useful to differentiate between
data and delete files. But I do agree that the int column is not useful. We
should probably display the FileContent name here?
##########
File path: site/docs/spark.md
##########
@@ -619,13 +619,30 @@ To show a table's data files and each file's metadata,
run:
SELECT * FROM prod.db.table.files
```
```text
-+-------------------------------------------------------------------------+-------------+--------------+--------------------+--------------------+------------------+-------------------+-----------------+-----------------+--------------+---------------+
-| file_path |
file_format | record_count | file_size_in_bytes | column_sizes |
value_counts | null_value_counts | lower_bounds | upper_bounds |
key_metadata | split_offsets |
-+-------------------------------------------------------------------------+-------------+--------------+--------------------+--------------------+------------------+-------------------+-----------------+-----------------+--------------+---------------+
-| s3:/.../table/data/00000-3-8d6d60e8-d427-4809-bcf0-f5d45a4aad96.parquet |
PARQUET | 1 | 597 | [1 -> 90, 2 -> 62] | [1 -> 1,
2 -> 1] | [1 -> 0, 2 -> 0] | [1 -> , 2 -> c] | [1 -> , 2 -> c] | null
| [4] |
-| s3:/.../table/data/00001-4-8d6d60e8-d427-4809-bcf0-f5d45a4aad96.parquet |
PARQUET | 1 | 597 | [1 -> 90, 2 -> 62] | [1 -> 1,
2 -> 1] | [1 -> 0, 2 -> 0] | [1 -> , 2 -> b] | [1 -> , 2 -> b] | null
| [4] |
-| s3:/.../table/data/00002-5-8d6d60e8-d427-4809-bcf0-f5d45a4aad96.parquet |
PARQUET | 1 | 597 | [1 -> 90, 2 -> 62] | [1 -> 1,
2 -> 1] | [1 -> 0, 2 -> 0] | [1 -> , 2 -> a] | [1 -> , 2 -> a] | null
| [4] |
-+-------------------------------------------------------------------------+-------------+--------------+--------------------+--------------------+------------------+-------------------+-----------------+-----------------+--------------+---------------+
++---------+-------------------------------------------------------------------------+-------------+--------------+--------------------+--------------------+------------------+-------------------+-----------------+-----------------+--------------+---------------+--------------+
+| content | file_path
| file_format | record_count | file_size_in_bytes | column_sizes |
value_counts | null_value_counts | lower_bounds | upper_bounds |
key_metadata | split_offsets | equality_ids |
++---------+-------------------------------------------------------------------------+-------------+--------------+--------------------+--------------------+------------------+-------------------+-----------------+-----------------+--------------+---------------+--------------+
+| 0 |
s3:/.../table/data/00000-3-8d6d60e8-d427-4809-bcf0-f5d45a4aad96.parquet |
PARQUET | 1 | 597 | [1 -> 90, 2 -> 62] | [1 -> 1,
2 -> 1] | [1 -> 0, 2 -> 0] | [1 -> , 2 -> c] | [1 -> , 2 -> c] | null
| [4] | null |
+| 0 |
s3:/.../table/data/00001-4-8d6d60e8-d427-4809-bcf0-f5d45a4aad96.parquet |
PARQUET | 1 | 597 | [1 -> 90, 2 -> 62] | [1 -> 1,
2 -> 1] | [1 -> 0, 2 -> 0] | [1 -> , 2 -> b] | [1 -> , 2 -> b] | null
| [4] | null |
+| 0 |
s3:/.../table/data/00002-5-8d6d60e8-d427-4809-bcf0-f5d45a4aad96.parquet |
PARQUET | 1 | 597 | [1 -> 90, 2 -> 62] | [1 -> 1,
2 -> 1] | [1 -> 0, 2 -> 0] | [1 -> , 2 -> a] | [1 -> , 2 -> a] | null
| [4] | null |
++-----------------------------------------------------------------------------------+-------------+--------------+--------------------+--------------------+------------------+-------------------+-----------------+-----------------+--------------+---------------+--------------+
+```
+
+### Entries
+
+To show a table's manifest entries, run:
+
+```sql
+SELECT * FROM prod.db.table.entries
+```
+```text
++--------+----------------+-----------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
+| status | snapshot_id | sequence_number | data_file
|
++--------+----------------+-----------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
+| 1 | 57897183625154 | 0 | [0,
s3:/.../table/data/00000-3-8d6d60e8-d427-4809-bcf0-f5d45a4aad96.parquet,
PARQUET, 1, 597, [1 -> 90, 2 -> 62], [1 -> 1, 2 -> 1], [1 -> 0, 2 -> 0], [1 ->
, 2 -> c], [1 -> , 2 -> c], null, [4], null] |
+| 1 | 57897183625154 | 0 | [0,
s3:/.../table/data/00001-4-8d6d60e8-d427-4809-bcf0-f5d45a4aad96.parquet,
PARQUET, 1, 597, [1 -> 90, 2 -> 62], [1 -> 1, 2 -> 1], [1 -> 0, 2 -> 0], [1 ->
, 2 -> b], [1 -> , 2 -> b], null, [4], null] |
+| 1 | 57897183625154 | 0 | [0,
s3:/.../table/data/00002-5-8d6d60e8-d427-4809-bcf0-f5d45a4aad96.parquet,
PARQUET, 1, 597, [1 -> 90, 2 -> 62], [1 -> 1, 2 -> 1], [1 -> 0, 2 -> 0], [1 ->
, 2 -> a], [1 -> , 2 -> a], null, [4], null] |
++--------+----------------+-----------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
Review comment:
I think that makes sense. As I mentioned in #1365, I wasn't sure whether
this was intentional, so I just went ahead and added it. I can remove this. I
have found the `entries` table to be really useful for looking at the set of
files that have come between two snapshots (or in a set of snapshots).
##########
File path: site/docs/spark.md
##########
@@ -619,13 +619,30 @@ To show a table's data files and each file's metadata,
run:
SELECT * FROM prod.db.table.files
```
```text
-+-------------------------------------------------------------------------+-------------+--------------+--------------------+--------------------+------------------+-------------------+-----------------+-----------------+--------------+---------------+
-| file_path |
file_format | record_count | file_size_in_bytes | column_sizes |
value_counts | null_value_counts | lower_bounds | upper_bounds |
key_metadata | split_offsets |
-+-------------------------------------------------------------------------+-------------+--------------+--------------------+--------------------+------------------+-------------------+-----------------+-----------------+--------------+---------------+
-| s3:/.../table/data/00000-3-8d6d60e8-d427-4809-bcf0-f5d45a4aad96.parquet |
PARQUET | 1 | 597 | [1 -> 90, 2 -> 62] | [1 -> 1,
2 -> 1] | [1 -> 0, 2 -> 0] | [1 -> , 2 -> c] | [1 -> , 2 -> c] | null
| [4] |
-| s3:/.../table/data/00001-4-8d6d60e8-d427-4809-bcf0-f5d45a4aad96.parquet |
PARQUET | 1 | 597 | [1 -> 90, 2 -> 62] | [1 -> 1,
2 -> 1] | [1 -> 0, 2 -> 0] | [1 -> , 2 -> b] | [1 -> , 2 -> b] | null
| [4] |
-| s3:/.../table/data/00002-5-8d6d60e8-d427-4809-bcf0-f5d45a4aad96.parquet |
PARQUET | 1 | 597 | [1 -> 90, 2 -> 62] | [1 -> 1,
2 -> 1] | [1 -> 0, 2 -> 0] | [1 -> , 2 -> a] | [1 -> , 2 -> a] | null
| [4] |
-+-------------------------------------------------------------------------+-------------+--------------+--------------------+--------------------+------------------+-------------------+-----------------+-----------------+--------------+---------------+
++---------+-------------------------------------------------------------------------+-------------+--------------+--------------------+--------------------+------------------+-------------------+-----------------+-----------------+--------------+---------------+--------------+
+| content | file_path
| file_format | record_count | file_size_in_bytes | column_sizes |
value_counts | null_value_counts | lower_bounds | upper_bounds |
key_metadata | split_offsets | equality_ids |
++---------+-------------------------------------------------------------------------+-------------+--------------+--------------------+--------------------+------------------+-------------------+-----------------+-----------------+--------------+---------------+--------------+
+| 0 |
s3:/.../table/data/00000-3-8d6d60e8-d427-4809-bcf0-f5d45a4aad96.parquet |
PARQUET | 1 | 597 | [1 -> 90, 2 -> 62] | [1 -> 1,
2 -> 1] | [1 -> 0, 2 -> 0] | [1 -> , 2 -> c] | [1 -> , 2 -> c] | null
| [4] | null |
+| 0 |
s3:/.../table/data/00001-4-8d6d60e8-d427-4809-bcf0-f5d45a4aad96.parquet |
PARQUET | 1 | 597 | [1 -> 90, 2 -> 62] | [1 -> 1,
2 -> 1] | [1 -> 0, 2 -> 0] | [1 -> , 2 -> b] | [1 -> , 2 -> b] | null
| [4] | null |
+| 0 |
s3:/.../table/data/00002-5-8d6d60e8-d427-4809-bcf0-f5d45a4aad96.parquet |
PARQUET | 1 | 597 | [1 -> 90, 2 -> 62] | [1 -> 1,
2 -> 1] | [1 -> 0, 2 -> 0] | [1 -> , 2 -> a] | [1 -> , 2 -> a] | null
| [4] | null |
++-----------------------------------------------------------------------------------+-------------+--------------+--------------------+--------------------+------------------+-------------------+-----------------+-----------------+--------------+---------------+--------------+
+```
Review comment:
Seems like the files table only returns data files and not delete files,
so content will always be `0`. We can remove this field. Same with
`equalityIds`, this will always be null since the files table only covers data
files.
##########
File path: site/docs/spark.md
##########
@@ -619,13 +619,30 @@ To show a table's data files and each file's metadata,
run:
SELECT * FROM prod.db.table.files
```
```text
-+-------------------------------------------------------------------------+-------------+--------------+--------------------+--------------------+------------------+-------------------+-----------------+-----------------+--------------+---------------+
-| file_path |
file_format | record_count | file_size_in_bytes | column_sizes |
value_counts | null_value_counts | lower_bounds | upper_bounds |
key_metadata | split_offsets |
-+-------------------------------------------------------------------------+-------------+--------------+--------------------+--------------------+------------------+-------------------+-----------------+-----------------+--------------+---------------+
-| s3:/.../table/data/00000-3-8d6d60e8-d427-4809-bcf0-f5d45a4aad96.parquet |
PARQUET | 1 | 597 | [1 -> 90, 2 -> 62] | [1 -> 1,
2 -> 1] | [1 -> 0, 2 -> 0] | [1 -> , 2 -> c] | [1 -> , 2 -> c] | null
| [4] |
-| s3:/.../table/data/00001-4-8d6d60e8-d427-4809-bcf0-f5d45a4aad96.parquet |
PARQUET | 1 | 597 | [1 -> 90, 2 -> 62] | [1 -> 1,
2 -> 1] | [1 -> 0, 2 -> 0] | [1 -> , 2 -> b] | [1 -> , 2 -> b] | null
| [4] |
-| s3:/.../table/data/00002-5-8d6d60e8-d427-4809-bcf0-f5d45a4aad96.parquet |
PARQUET | 1 | 597 | [1 -> 90, 2 -> 62] | [1 -> 1,
2 -> 1] | [1 -> 0, 2 -> 0] | [1 -> , 2 -> a] | [1 -> , 2 -> a] | null
| [4] |
-+-------------------------------------------------------------------------+-------------+--------------+--------------------+--------------------+------------------+-------------------+-----------------+-----------------+--------------+---------------+
++---------+-------------------------------------------------------------------------+-------------+--------------+--------------------+--------------------+------------------+-------------------+-----------------+-----------------+--------------+---------------+--------------+
+| content | file_path
| file_format | record_count | file_size_in_bytes | column_sizes |
value_counts | null_value_counts | lower_bounds | upper_bounds |
key_metadata | split_offsets | equality_ids |
++---------+-------------------------------------------------------------------------+-------------+--------------+--------------------+--------------------+------------------+-------------------+-----------------+-----------------+--------------+---------------+--------------+
+| 0 |
s3:/.../table/data/00000-3-8d6d60e8-d427-4809-bcf0-f5d45a4aad96.parquet |
PARQUET | 1 | 597 | [1 -> 90, 2 -> 62] | [1 -> 1,
2 -> 1] | [1 -> 0, 2 -> 0] | [1 -> , 2 -> c] | [1 -> , 2 -> c] | null
| [4] | null |
+| 0 |
s3:/.../table/data/00001-4-8d6d60e8-d427-4809-bcf0-f5d45a4aad96.parquet |
PARQUET | 1 | 597 | [1 -> 90, 2 -> 62] | [1 -> 1,
2 -> 1] | [1 -> 0, 2 -> 0] | [1 -> , 2 -> b] | [1 -> , 2 -> b] | null
| [4] | null |
+| 0 |
s3:/.../table/data/00002-5-8d6d60e8-d427-4809-bcf0-f5d45a4aad96.parquet |
PARQUET | 1 | 597 | [1 -> 90, 2 -> 62] | [1 -> 1,
2 -> 1] | [1 -> 0, 2 -> 0] | [1 -> , 2 -> a] | [1 -> , 2 -> a] | null
| [4] | null |
++-----------------------------------------------------------------------------------+-------------+--------------+--------------------+--------------------+------------------+-------------------+-----------------+-----------------+--------------+---------------+--------------+
+```
Review comment:
I think the `content` column will be useful to differentiate between
data and delete files. But I do agree that the int column is not useful. We
should probably display the FileContent name here?
##########
File path: site/docs/spark.md
##########
@@ -619,13 +619,30 @@ To show a table's data files and each file's metadata,
run:
SELECT * FROM prod.db.table.files
```
```text
-+-------------------------------------------------------------------------+-------------+--------------+--------------------+--------------------+------------------+-------------------+-----------------+-----------------+--------------+---------------+
-| file_path |
file_format | record_count | file_size_in_bytes | column_sizes |
value_counts | null_value_counts | lower_bounds | upper_bounds |
key_metadata | split_offsets |
-+-------------------------------------------------------------------------+-------------+--------------+--------------------+--------------------+------------------+-------------------+-----------------+-----------------+--------------+---------------+
-| s3:/.../table/data/00000-3-8d6d60e8-d427-4809-bcf0-f5d45a4aad96.parquet |
PARQUET | 1 | 597 | [1 -> 90, 2 -> 62] | [1 -> 1,
2 -> 1] | [1 -> 0, 2 -> 0] | [1 -> , 2 -> c] | [1 -> , 2 -> c] | null
| [4] |
-| s3:/.../table/data/00001-4-8d6d60e8-d427-4809-bcf0-f5d45a4aad96.parquet |
PARQUET | 1 | 597 | [1 -> 90, 2 -> 62] | [1 -> 1,
2 -> 1] | [1 -> 0, 2 -> 0] | [1 -> , 2 -> b] | [1 -> , 2 -> b] | null
| [4] |
-| s3:/.../table/data/00002-5-8d6d60e8-d427-4809-bcf0-f5d45a4aad96.parquet |
PARQUET | 1 | 597 | [1 -> 90, 2 -> 62] | [1 -> 1,
2 -> 1] | [1 -> 0, 2 -> 0] | [1 -> , 2 -> a] | [1 -> , 2 -> a] | null
| [4] |
-+-------------------------------------------------------------------------+-------------+--------------+--------------------+--------------------+------------------+-------------------+-----------------+-----------------+--------------+---------------+
++---------+-------------------------------------------------------------------------+-------------+--------------+--------------------+--------------------+------------------+-------------------+-----------------+-----------------+--------------+---------------+--------------+
+| content | file_path
| file_format | record_count | file_size_in_bytes | column_sizes |
value_counts | null_value_counts | lower_bounds | upper_bounds |
key_metadata | split_offsets | equality_ids |
++---------+-------------------------------------------------------------------------+-------------+--------------+--------------------+--------------------+------------------+-------------------+-----------------+-----------------+--------------+---------------+--------------+
+| 0 |
s3:/.../table/data/00000-3-8d6d60e8-d427-4809-bcf0-f5d45a4aad96.parquet |
PARQUET | 1 | 597 | [1 -> 90, 2 -> 62] | [1 -> 1,
2 -> 1] | [1 -> 0, 2 -> 0] | [1 -> , 2 -> c] | [1 -> , 2 -> c] | null
| [4] | null |
+| 0 |
s3:/.../table/data/00001-4-8d6d60e8-d427-4809-bcf0-f5d45a4aad96.parquet |
PARQUET | 1 | 597 | [1 -> 90, 2 -> 62] | [1 -> 1,
2 -> 1] | [1 -> 0, 2 -> 0] | [1 -> , 2 -> b] | [1 -> , 2 -> b] | null
| [4] | null |
+| 0 |
s3:/.../table/data/00002-5-8d6d60e8-d427-4809-bcf0-f5d45a4aad96.parquet |
PARQUET | 1 | 597 | [1 -> 90, 2 -> 62] | [1 -> 1,
2 -> 1] | [1 -> 0, 2 -> 0] | [1 -> , 2 -> a] | [1 -> , 2 -> a] | null
| [4] | null |
++-----------------------------------------------------------------------------------+-------------+--------------+--------------------+--------------------+------------------+-------------------+-----------------+-----------------+--------------+---------------+--------------+
+```
+
+### Entries
+
+To show a table's manifest entries, run:
+
+```sql
+SELECT * FROM prod.db.table.entries
+```
+```text
++--------+----------------+-----------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
+| status | snapshot_id | sequence_number | data_file
|
++--------+----------------+-----------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
+| 1 | 57897183625154 | 0 | [0,
s3:/.../table/data/00000-3-8d6d60e8-d427-4809-bcf0-f5d45a4aad96.parquet,
PARQUET, 1, 597, [1 -> 90, 2 -> 62], [1 -> 1, 2 -> 1], [1 -> 0, 2 -> 0], [1 ->
, 2 -> c], [1 -> , 2 -> c], null, [4], null] |
+| 1 | 57897183625154 | 0 | [0,
s3:/.../table/data/00001-4-8d6d60e8-d427-4809-bcf0-f5d45a4aad96.parquet,
PARQUET, 1, 597, [1 -> 90, 2 -> 62], [1 -> 1, 2 -> 1], [1 -> 0, 2 -> 0], [1 ->
, 2 -> b], [1 -> , 2 -> b], null, [4], null] |
+| 1 | 57897183625154 | 0 | [0,
s3:/.../table/data/00002-5-8d6d60e8-d427-4809-bcf0-f5d45a4aad96.parquet,
PARQUET, 1, 597, [1 -> 90, 2 -> 62], [1 -> 1, 2 -> 1], [1 -> 0, 2 -> 0], [1 ->
, 2 -> a], [1 -> , 2 -> a], null, [4], null] |
++--------+----------------+-----------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
Review comment:
I think that makes sense. As I mentioned in #1365, I wasn't sure whether
this was intentional, so I just went ahead and added it. I can remove this. I
have found the `entries` table to be really useful for looking at the set of
files that have come between two snapshots (or in a set of snapshots).
##########
File path: site/docs/spark.md
##########
@@ -619,13 +619,30 @@ To show a table's data files and each file's metadata,
run:
SELECT * FROM prod.db.table.files
```
```text
-+-------------------------------------------------------------------------+-------------+--------------+--------------------+--------------------+------------------+-------------------+-----------------+-----------------+--------------+---------------+
-| file_path |
file_format | record_count | file_size_in_bytes | column_sizes |
value_counts | null_value_counts | lower_bounds | upper_bounds |
key_metadata | split_offsets |
-+-------------------------------------------------------------------------+-------------+--------------+--------------------+--------------------+------------------+-------------------+-----------------+-----------------+--------------+---------------+
-| s3:/.../table/data/00000-3-8d6d60e8-d427-4809-bcf0-f5d45a4aad96.parquet |
PARQUET | 1 | 597 | [1 -> 90, 2 -> 62] | [1 -> 1,
2 -> 1] | [1 -> 0, 2 -> 0] | [1 -> , 2 -> c] | [1 -> , 2 -> c] | null
| [4] |
-| s3:/.../table/data/00001-4-8d6d60e8-d427-4809-bcf0-f5d45a4aad96.parquet |
PARQUET | 1 | 597 | [1 -> 90, 2 -> 62] | [1 -> 1,
2 -> 1] | [1 -> 0, 2 -> 0] | [1 -> , 2 -> b] | [1 -> , 2 -> b] | null
| [4] |
-| s3:/.../table/data/00002-5-8d6d60e8-d427-4809-bcf0-f5d45a4aad96.parquet |
PARQUET | 1 | 597 | [1 -> 90, 2 -> 62] | [1 -> 1,
2 -> 1] | [1 -> 0, 2 -> 0] | [1 -> , 2 -> a] | [1 -> , 2 -> a] | null
| [4] |
-+-------------------------------------------------------------------------+-------------+--------------+--------------------+--------------------+------------------+-------------------+-----------------+-----------------+--------------+---------------+
++---------+-------------------------------------------------------------------------+-------------+--------------+--------------------+--------------------+------------------+-------------------+-----------------+-----------------+--------------+---------------+--------------+
+| content | file_path
| file_format | record_count | file_size_in_bytes | column_sizes |
value_counts | null_value_counts | lower_bounds | upper_bounds |
key_metadata | split_offsets | equality_ids |
++---------+-------------------------------------------------------------------------+-------------+--------------+--------------------+--------------------+------------------+-------------------+-----------------+-----------------+--------------+---------------+--------------+
+| 0 |
s3:/.../table/data/00000-3-8d6d60e8-d427-4809-bcf0-f5d45a4aad96.parquet |
PARQUET | 1 | 597 | [1 -> 90, 2 -> 62] | [1 -> 1,
2 -> 1] | [1 -> 0, 2 -> 0] | [1 -> , 2 -> c] | [1 -> , 2 -> c] | null
| [4] | null |
+| 0 |
s3:/.../table/data/00001-4-8d6d60e8-d427-4809-bcf0-f5d45a4aad96.parquet |
PARQUET | 1 | 597 | [1 -> 90, 2 -> 62] | [1 -> 1,
2 -> 1] | [1 -> 0, 2 -> 0] | [1 -> , 2 -> b] | [1 -> , 2 -> b] | null
| [4] | null |
+| 0 |
s3:/.../table/data/00002-5-8d6d60e8-d427-4809-bcf0-f5d45a4aad96.parquet |
PARQUET | 1 | 597 | [1 -> 90, 2 -> 62] | [1 -> 1,
2 -> 1] | [1 -> 0, 2 -> 0] | [1 -> , 2 -> a] | [1 -> , 2 -> a] | null
| [4] | null |
++-----------------------------------------------------------------------------------+-------------+--------------+--------------------+--------------------+------------------+-------------------+-----------------+-----------------+--------------+---------------+--------------+
+```
Review comment:
Seems like the files table only returns data files and not delete files,
so content will always be `0`. We can remove this field. Same with
`equalityIds`, this will always be null since the files table only covers data
files.
##########
File path: site/docs/spark.md
##########
@@ -619,13 +619,30 @@ To show a table's data files and each file's metadata,
run:
SELECT * FROM prod.db.table.files
```
```text
-+-------------------------------------------------------------------------+-------------+--------------+--------------------+--------------------+------------------+-------------------+-----------------+-----------------+--------------+---------------+
-| file_path |
file_format | record_count | file_size_in_bytes | column_sizes |
value_counts | null_value_counts | lower_bounds | upper_bounds |
key_metadata | split_offsets |
-+-------------------------------------------------------------------------+-------------+--------------+--------------------+--------------------+------------------+-------------------+-----------------+-----------------+--------------+---------------+
-| s3:/.../table/data/00000-3-8d6d60e8-d427-4809-bcf0-f5d45a4aad96.parquet |
PARQUET | 1 | 597 | [1 -> 90, 2 -> 62] | [1 -> 1,
2 -> 1] | [1 -> 0, 2 -> 0] | [1 -> , 2 -> c] | [1 -> , 2 -> c] | null
| [4] |
-| s3:/.../table/data/00001-4-8d6d60e8-d427-4809-bcf0-f5d45a4aad96.parquet |
PARQUET | 1 | 597 | [1 -> 90, 2 -> 62] | [1 -> 1,
2 -> 1] | [1 -> 0, 2 -> 0] | [1 -> , 2 -> b] | [1 -> , 2 -> b] | null
| [4] |
-| s3:/.../table/data/00002-5-8d6d60e8-d427-4809-bcf0-f5d45a4aad96.parquet |
PARQUET | 1 | 597 | [1 -> 90, 2 -> 62] | [1 -> 1,
2 -> 1] | [1 -> 0, 2 -> 0] | [1 -> , 2 -> a] | [1 -> , 2 -> a] | null
| [4] |
-+-------------------------------------------------------------------------+-------------+--------------+--------------------+--------------------+------------------+-------------------+-----------------+-----------------+--------------+---------------+
++---------+-------------------------------------------------------------------------+-------------+--------------+--------------------+--------------------+------------------+-------------------+-----------------+-----------------+--------------+---------------+--------------+
+| content | file_path
| file_format | record_count | file_size_in_bytes | column_sizes |
value_counts | null_value_counts | lower_bounds | upper_bounds |
key_metadata | split_offsets | equality_ids |
++---------+-------------------------------------------------------------------------+-------------+--------------+--------------------+--------------------+------------------+-------------------+-----------------+-----------------+--------------+---------------+--------------+
+| 0 |
s3:/.../table/data/00000-3-8d6d60e8-d427-4809-bcf0-f5d45a4aad96.parquet |
PARQUET | 1 | 597 | [1 -> 90, 2 -> 62] | [1 -> 1,
2 -> 1] | [1 -> 0, 2 -> 0] | [1 -> , 2 -> c] | [1 -> , 2 -> c] | null
| [4] | null |
+| 0 |
s3:/.../table/data/00001-4-8d6d60e8-d427-4809-bcf0-f5d45a4aad96.parquet |
PARQUET | 1 | 597 | [1 -> 90, 2 -> 62] | [1 -> 1,
2 -> 1] | [1 -> 0, 2 -> 0] | [1 -> , 2 -> b] | [1 -> , 2 -> b] | null
| [4] | null |
+| 0 |
s3:/.../table/data/00002-5-8d6d60e8-d427-4809-bcf0-f5d45a4aad96.parquet |
PARQUET | 1 | 597 | [1 -> 90, 2 -> 62] | [1 -> 1,
2 -> 1] | [1 -> 0, 2 -> 0] | [1 -> , 2 -> a] | [1 -> , 2 -> a] | null
| [4] | null |
++-----------------------------------------------------------------------------------+-------------+--------------+--------------------+--------------------+------------------+-------------------+-----------------+-----------------+--------------+---------------+--------------+
+```
Review comment:
I think the `content` column will be useful to differentiate between
data and delete files. But I do agree that the int column is not useful. We
should probably display the FileContent name here?
##########
File path: site/docs/spark.md
##########
@@ -619,13 +619,30 @@ To show a table's data files and each file's metadata,
run:
SELECT * FROM prod.db.table.files
```
```text
-+-------------------------------------------------------------------------+-------------+--------------+--------------------+--------------------+------------------+-------------------+-----------------+-----------------+--------------+---------------+
-| file_path |
file_format | record_count | file_size_in_bytes | column_sizes |
value_counts | null_value_counts | lower_bounds | upper_bounds |
key_metadata | split_offsets |
-+-------------------------------------------------------------------------+-------------+--------------+--------------------+--------------------+------------------+-------------------+-----------------+-----------------+--------------+---------------+
-| s3:/.../table/data/00000-3-8d6d60e8-d427-4809-bcf0-f5d45a4aad96.parquet |
PARQUET | 1 | 597 | [1 -> 90, 2 -> 62] | [1 -> 1,
2 -> 1] | [1 -> 0, 2 -> 0] | [1 -> , 2 -> c] | [1 -> , 2 -> c] | null
| [4] |
-| s3:/.../table/data/00001-4-8d6d60e8-d427-4809-bcf0-f5d45a4aad96.parquet |
PARQUET | 1 | 597 | [1 -> 90, 2 -> 62] | [1 -> 1,
2 -> 1] | [1 -> 0, 2 -> 0] | [1 -> , 2 -> b] | [1 -> , 2 -> b] | null
| [4] |
-| s3:/.../table/data/00002-5-8d6d60e8-d427-4809-bcf0-f5d45a4aad96.parquet |
PARQUET | 1 | 597 | [1 -> 90, 2 -> 62] | [1 -> 1,
2 -> 1] | [1 -> 0, 2 -> 0] | [1 -> , 2 -> a] | [1 -> , 2 -> a] | null
| [4] |
-+-------------------------------------------------------------------------+-------------+--------------+--------------------+--------------------+------------------+-------------------+-----------------+-----------------+--------------+---------------+
++---------+-------------------------------------------------------------------------+-------------+--------------+--------------------+--------------------+------------------+-------------------+-----------------+-----------------+--------------+---------------+--------------+
+| content | file_path
| file_format | record_count | file_size_in_bytes | column_sizes |
value_counts | null_value_counts | lower_bounds | upper_bounds |
key_metadata | split_offsets | equality_ids |
++---------+-------------------------------------------------------------------------+-------------+--------------+--------------------+--------------------+------------------+-------------------+-----------------+-----------------+--------------+---------------+--------------+
+| 0 |
s3:/.../table/data/00000-3-8d6d60e8-d427-4809-bcf0-f5d45a4aad96.parquet |
PARQUET | 1 | 597 | [1 -> 90, 2 -> 62] | [1 -> 1,
2 -> 1] | [1 -> 0, 2 -> 0] | [1 -> , 2 -> c] | [1 -> , 2 -> c] | null
| [4] | null |
+| 0 |
s3:/.../table/data/00001-4-8d6d60e8-d427-4809-bcf0-f5d45a4aad96.parquet |
PARQUET | 1 | 597 | [1 -> 90, 2 -> 62] | [1 -> 1,
2 -> 1] | [1 -> 0, 2 -> 0] | [1 -> , 2 -> b] | [1 -> , 2 -> b] | null
| [4] | null |
+| 0 |
s3:/.../table/data/00002-5-8d6d60e8-d427-4809-bcf0-f5d45a4aad96.parquet |
PARQUET | 1 | 597 | [1 -> 90, 2 -> 62] | [1 -> 1,
2 -> 1] | [1 -> 0, 2 -> 0] | [1 -> , 2 -> a] | [1 -> , 2 -> a] | null
| [4] | null |
++-----------------------------------------------------------------------------------+-------------+--------------+--------------------+--------------------+------------------+-------------------+-----------------+-----------------+--------------+---------------+--------------+
+```
+
+### Entries
+
+To show a table's manifest entries, run:
+
+```sql
+SELECT * FROM prod.db.table.entries
+```
+```text
++--------+----------------+-----------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
+| status | snapshot_id | sequence_number | data_file
|
++--------+----------------+-----------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
+| 1 | 57897183625154 | 0 | [0,
s3:/.../table/data/00000-3-8d6d60e8-d427-4809-bcf0-f5d45a4aad96.parquet,
PARQUET, 1, 597, [1 -> 90, 2 -> 62], [1 -> 1, 2 -> 1], [1 -> 0, 2 -> 0], [1 ->
, 2 -> c], [1 -> , 2 -> c], null, [4], null] |
+| 1 | 57897183625154 | 0 | [0,
s3:/.../table/data/00001-4-8d6d60e8-d427-4809-bcf0-f5d45a4aad96.parquet,
PARQUET, 1, 597, [1 -> 90, 2 -> 62], [1 -> 1, 2 -> 1], [1 -> 0, 2 -> 0], [1 ->
, 2 -> b], [1 -> , 2 -> b], null, [4], null] |
+| 1 | 57897183625154 | 0 | [0,
s3:/.../table/data/00002-5-8d6d60e8-d427-4809-bcf0-f5d45a4aad96.parquet,
PARQUET, 1, 597, [1 -> 90, 2 -> 62], [1 -> 1, 2 -> 1], [1 -> 0, 2 -> 0], [1 ->
, 2 -> a], [1 -> , 2 -> a], null, [4], null] |
++--------+----------------+-----------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
Review comment:
I think that makes sense. As I mentioned in #1365, I wasn't sure whether
this was intentional, so I just went ahead and added it. I can remove this. I
have found the `entries` table to be really useful for looking at the set of
files that have come between two snapshots (or in a set of snapshots).
##########
File path: site/docs/spark.md
##########
@@ -619,13 +619,30 @@ To show a table's data files and each file's metadata,
run:
SELECT * FROM prod.db.table.files
```
```text
-+-------------------------------------------------------------------------+-------------+--------------+--------------------+--------------------+------------------+-------------------+-----------------+-----------------+--------------+---------------+
-| file_path |
file_format | record_count | file_size_in_bytes | column_sizes |
value_counts | null_value_counts | lower_bounds | upper_bounds |
key_metadata | split_offsets |
-+-------------------------------------------------------------------------+-------------+--------------+--------------------+--------------------+------------------+-------------------+-----------------+-----------------+--------------+---------------+
-| s3:/.../table/data/00000-3-8d6d60e8-d427-4809-bcf0-f5d45a4aad96.parquet |
PARQUET | 1 | 597 | [1 -> 90, 2 -> 62] | [1 -> 1,
2 -> 1] | [1 -> 0, 2 -> 0] | [1 -> , 2 -> c] | [1 -> , 2 -> c] | null
| [4] |
-| s3:/.../table/data/00001-4-8d6d60e8-d427-4809-bcf0-f5d45a4aad96.parquet |
PARQUET | 1 | 597 | [1 -> 90, 2 -> 62] | [1 -> 1,
2 -> 1] | [1 -> 0, 2 -> 0] | [1 -> , 2 -> b] | [1 -> , 2 -> b] | null
| [4] |
-| s3:/.../table/data/00002-5-8d6d60e8-d427-4809-bcf0-f5d45a4aad96.parquet |
PARQUET | 1 | 597 | [1 -> 90, 2 -> 62] | [1 -> 1,
2 -> 1] | [1 -> 0, 2 -> 0] | [1 -> , 2 -> a] | [1 -> , 2 -> a] | null
| [4] |
-+-------------------------------------------------------------------------+-------------+--------------+--------------------+--------------------+------------------+-------------------+-----------------+-----------------+--------------+---------------+
++---------+-------------------------------------------------------------------------+-------------+--------------+--------------------+--------------------+------------------+-------------------+-----------------+-----------------+--------------+---------------+--------------+
+| content | file_path
| file_format | record_count | file_size_in_bytes | column_sizes |
value_counts | null_value_counts | lower_bounds | upper_bounds |
key_metadata | split_offsets | equality_ids |
++---------+-------------------------------------------------------------------------+-------------+--------------+--------------------+--------------------+------------------+-------------------+-----------------+-----------------+--------------+---------------+--------------+
+| 0 |
s3:/.../table/data/00000-3-8d6d60e8-d427-4809-bcf0-f5d45a4aad96.parquet |
PARQUET | 1 | 597 | [1 -> 90, 2 -> 62] | [1 -> 1,
2 -> 1] | [1 -> 0, 2 -> 0] | [1 -> , 2 -> c] | [1 -> , 2 -> c] | null
| [4] | null |
+| 0 |
s3:/.../table/data/00001-4-8d6d60e8-d427-4809-bcf0-f5d45a4aad96.parquet |
PARQUET | 1 | 597 | [1 -> 90, 2 -> 62] | [1 -> 1,
2 -> 1] | [1 -> 0, 2 -> 0] | [1 -> , 2 -> b] | [1 -> , 2 -> b] | null
| [4] | null |
+| 0 |
s3:/.../table/data/00002-5-8d6d60e8-d427-4809-bcf0-f5d45a4aad96.parquet |
PARQUET | 1 | 597 | [1 -> 90, 2 -> 62] | [1 -> 1,
2 -> 1] | [1 -> 0, 2 -> 0] | [1 -> , 2 -> a] | [1 -> , 2 -> a] | null
| [4] | null |
++-----------------------------------------------------------------------------------+-------------+--------------+--------------------+--------------------+------------------+-------------------+-----------------+-----------------+--------------+---------------+--------------+
+```
Review comment:
Seems like the files table only returns data files and not delete files,
so content will always be `0`. We can remove this field. Same with
`equalityIds`, this will always be null since the files table only covers data
files.
##########
File path: site/docs/spark.md
##########
@@ -619,13 +619,30 @@ To show a table's data files and each file's metadata,
run:
SELECT * FROM prod.db.table.files
```
```text
-+-------------------------------------------------------------------------+-------------+--------------+--------------------+--------------------+------------------+-------------------+-----------------+-----------------+--------------+---------------+
-| file_path |
file_format | record_count | file_size_in_bytes | column_sizes |
value_counts | null_value_counts | lower_bounds | upper_bounds |
key_metadata | split_offsets |
-+-------------------------------------------------------------------------+-------------+--------------+--------------------+--------------------+------------------+-------------------+-----------------+-----------------+--------------+---------------+
-| s3:/.../table/data/00000-3-8d6d60e8-d427-4809-bcf0-f5d45a4aad96.parquet |
PARQUET | 1 | 597 | [1 -> 90, 2 -> 62] | [1 -> 1,
2 -> 1] | [1 -> 0, 2 -> 0] | [1 -> , 2 -> c] | [1 -> , 2 -> c] | null
| [4] |
-| s3:/.../table/data/00001-4-8d6d60e8-d427-4809-bcf0-f5d45a4aad96.parquet |
PARQUET | 1 | 597 | [1 -> 90, 2 -> 62] | [1 -> 1,
2 -> 1] | [1 -> 0, 2 -> 0] | [1 -> , 2 -> b] | [1 -> , 2 -> b] | null
| [4] |
-| s3:/.../table/data/00002-5-8d6d60e8-d427-4809-bcf0-f5d45a4aad96.parquet |
PARQUET | 1 | 597 | [1 -> 90, 2 -> 62] | [1 -> 1,
2 -> 1] | [1 -> 0, 2 -> 0] | [1 -> , 2 -> a] | [1 -> , 2 -> a] | null
| [4] |
-+-------------------------------------------------------------------------+-------------+--------------+--------------------+--------------------+------------------+-------------------+-----------------+-----------------+--------------+---------------+
++---------+-------------------------------------------------------------------------+-------------+--------------+--------------------+--------------------+------------------+-------------------+-----------------+-----------------+--------------+---------------+--------------+
+| content | file_path
| file_format | record_count | file_size_in_bytes | column_sizes |
value_counts | null_value_counts | lower_bounds | upper_bounds |
key_metadata | split_offsets | equality_ids |
++---------+-------------------------------------------------------------------------+-------------+--------------+--------------------+--------------------+------------------+-------------------+-----------------+-----------------+--------------+---------------+--------------+
+| 0 |
s3:/.../table/data/00000-3-8d6d60e8-d427-4809-bcf0-f5d45a4aad96.parquet |
PARQUET | 1 | 597 | [1 -> 90, 2 -> 62] | [1 -> 1,
2 -> 1] | [1 -> 0, 2 -> 0] | [1 -> , 2 -> c] | [1 -> , 2 -> c] | null
| [4] | null |
+| 0 |
s3:/.../table/data/00001-4-8d6d60e8-d427-4809-bcf0-f5d45a4aad96.parquet |
PARQUET | 1 | 597 | [1 -> 90, 2 -> 62] | [1 -> 1,
2 -> 1] | [1 -> 0, 2 -> 0] | [1 -> , 2 -> b] | [1 -> , 2 -> b] | null
| [4] | null |
+| 0 |
s3:/.../table/data/00002-5-8d6d60e8-d427-4809-bcf0-f5d45a4aad96.parquet |
PARQUET | 1 | 597 | [1 -> 90, 2 -> 62] | [1 -> 1,
2 -> 1] | [1 -> 0, 2 -> 0] | [1 -> , 2 -> a] | [1 -> , 2 -> a] | null
| [4] | null |
++-----------------------------------------------------------------------------------+-------------+--------------+--------------------+--------------------+------------------+-------------------+-----------------+-----------------+--------------+---------------+--------------+
+```
Review comment:
I think the `content` column will be useful to differentiate between
data and delete files. But I do agree that the int column is not useful. We
should probably display the FileContent name here?
##########
File path: site/docs/spark.md
##########
@@ -619,13 +619,30 @@ To show a table's data files and each file's metadata,
run:
SELECT * FROM prod.db.table.files
```
```text
-+-------------------------------------------------------------------------+-------------+--------------+--------------------+--------------------+------------------+-------------------+-----------------+-----------------+--------------+---------------+
-| file_path |
file_format | record_count | file_size_in_bytes | column_sizes |
value_counts | null_value_counts | lower_bounds | upper_bounds |
key_metadata | split_offsets |
-+-------------------------------------------------------------------------+-------------+--------------+--------------------+--------------------+------------------+-------------------+-----------------+-----------------+--------------+---------------+
-| s3:/.../table/data/00000-3-8d6d60e8-d427-4809-bcf0-f5d45a4aad96.parquet |
PARQUET | 1 | 597 | [1 -> 90, 2 -> 62] | [1 -> 1,
2 -> 1] | [1 -> 0, 2 -> 0] | [1 -> , 2 -> c] | [1 -> , 2 -> c] | null
| [4] |
-| s3:/.../table/data/00001-4-8d6d60e8-d427-4809-bcf0-f5d45a4aad96.parquet |
PARQUET | 1 | 597 | [1 -> 90, 2 -> 62] | [1 -> 1,
2 -> 1] | [1 -> 0, 2 -> 0] | [1 -> , 2 -> b] | [1 -> , 2 -> b] | null
| [4] |
-| s3:/.../table/data/00002-5-8d6d60e8-d427-4809-bcf0-f5d45a4aad96.parquet |
PARQUET | 1 | 597 | [1 -> 90, 2 -> 62] | [1 -> 1,
2 -> 1] | [1 -> 0, 2 -> 0] | [1 -> , 2 -> a] | [1 -> , 2 -> a] | null
| [4] |
-+-------------------------------------------------------------------------+-------------+--------------+--------------------+--------------------+------------------+-------------------+-----------------+-----------------+--------------+---------------+
++---------+-------------------------------------------------------------------------+-------------+--------------+--------------------+--------------------+------------------+-------------------+-----------------+-----------------+--------------+---------------+--------------+
+| content | file_path
| file_format | record_count | file_size_in_bytes | column_sizes |
value_counts | null_value_counts | lower_bounds | upper_bounds |
key_metadata | split_offsets | equality_ids |
++---------+-------------------------------------------------------------------------+-------------+--------------+--------------------+--------------------+------------------+-------------------+-----------------+-----------------+--------------+---------------+--------------+
+| 0 |
s3:/.../table/data/00000-3-8d6d60e8-d427-4809-bcf0-f5d45a4aad96.parquet |
PARQUET | 1 | 597 | [1 -> 90, 2 -> 62] | [1 -> 1,
2 -> 1] | [1 -> 0, 2 -> 0] | [1 -> , 2 -> c] | [1 -> , 2 -> c] | null
| [4] | null |
+| 0 |
s3:/.../table/data/00001-4-8d6d60e8-d427-4809-bcf0-f5d45a4aad96.parquet |
PARQUET | 1 | 597 | [1 -> 90, 2 -> 62] | [1 -> 1,
2 -> 1] | [1 -> 0, 2 -> 0] | [1 -> , 2 -> b] | [1 -> , 2 -> b] | null
| [4] | null |
+| 0 |
s3:/.../table/data/00002-5-8d6d60e8-d427-4809-bcf0-f5d45a4aad96.parquet |
PARQUET | 1 | 597 | [1 -> 90, 2 -> 62] | [1 -> 1,
2 -> 1] | [1 -> 0, 2 -> 0] | [1 -> , 2 -> a] | [1 -> , 2 -> a] | null
| [4] | null |
++-----------------------------------------------------------------------------------+-------------+--------------+--------------------+--------------------+------------------+-------------------+-----------------+-----------------+--------------+---------------+--------------+
+```
+
+### Entries
+
+To show a table's manifest entries, run:
+
+```sql
+SELECT * FROM prod.db.table.entries
+```
+```text
++--------+----------------+-----------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
+| status | snapshot_id | sequence_number | data_file
|
++--------+----------------+-----------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
+| 1 | 57897183625154 | 0 | [0,
s3:/.../table/data/00000-3-8d6d60e8-d427-4809-bcf0-f5d45a4aad96.parquet,
PARQUET, 1, 597, [1 -> 90, 2 -> 62], [1 -> 1, 2 -> 1], [1 -> 0, 2 -> 0], [1 ->
, 2 -> c], [1 -> , 2 -> c], null, [4], null] |
+| 1 | 57897183625154 | 0 | [0,
s3:/.../table/data/00001-4-8d6d60e8-d427-4809-bcf0-f5d45a4aad96.parquet,
PARQUET, 1, 597, [1 -> 90, 2 -> 62], [1 -> 1, 2 -> 1], [1 -> 0, 2 -> 0], [1 ->
, 2 -> b], [1 -> , 2 -> b], null, [4], null] |
+| 1 | 57897183625154 | 0 | [0,
s3:/.../table/data/00002-5-8d6d60e8-d427-4809-bcf0-f5d45a4aad96.parquet,
PARQUET, 1, 597, [1 -> 90, 2 -> 62], [1 -> 1, 2 -> 1], [1 -> 0, 2 -> 0], [1 ->
, 2 -> a], [1 -> , 2 -> a], null, [4], null] |
++--------+----------------+-----------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
Review comment:
I think that makes sense. As I mentioned in #1365, I wasn't sure whether
this was intentional, so I just went ahead and added it. I can remove this. I
have found the `entries` table to be really useful for looking at the set of
files that have come between two snapshots (or in a set of snapshots).
##########
File path: site/docs/spark.md
##########
@@ -619,13 +619,30 @@ To show a table's data files and each file's metadata,
run:
SELECT * FROM prod.db.table.files
```
```text
-+-------------------------------------------------------------------------+-------------+--------------+--------------------+--------------------+------------------+-------------------+-----------------+-----------------+--------------+---------------+
-| file_path |
file_format | record_count | file_size_in_bytes | column_sizes |
value_counts | null_value_counts | lower_bounds | upper_bounds |
key_metadata | split_offsets |
-+-------------------------------------------------------------------------+-------------+--------------+--------------------+--------------------+------------------+-------------------+-----------------+-----------------+--------------+---------------+
-| s3:/.../table/data/00000-3-8d6d60e8-d427-4809-bcf0-f5d45a4aad96.parquet |
PARQUET | 1 | 597 | [1 -> 90, 2 -> 62] | [1 -> 1,
2 -> 1] | [1 -> 0, 2 -> 0] | [1 -> , 2 -> c] | [1 -> , 2 -> c] | null
| [4] |
-| s3:/.../table/data/00001-4-8d6d60e8-d427-4809-bcf0-f5d45a4aad96.parquet |
PARQUET | 1 | 597 | [1 -> 90, 2 -> 62] | [1 -> 1,
2 -> 1] | [1 -> 0, 2 -> 0] | [1 -> , 2 -> b] | [1 -> , 2 -> b] | null
| [4] |
-| s3:/.../table/data/00002-5-8d6d60e8-d427-4809-bcf0-f5d45a4aad96.parquet |
PARQUET | 1 | 597 | [1 -> 90, 2 -> 62] | [1 -> 1,
2 -> 1] | [1 -> 0, 2 -> 0] | [1 -> , 2 -> a] | [1 -> , 2 -> a] | null
| [4] |
-+-------------------------------------------------------------------------+-------------+--------------+--------------------+--------------------+------------------+-------------------+-----------------+-----------------+--------------+---------------+
++---------+-------------------------------------------------------------------------+-------------+--------------+--------------------+--------------------+------------------+-------------------+-----------------+-----------------+--------------+---------------+--------------+
+| content | file_path
| file_format | record_count | file_size_in_bytes | column_sizes |
value_counts | null_value_counts | lower_bounds | upper_bounds |
key_metadata | split_offsets | equality_ids |
++---------+-------------------------------------------------------------------------+-------------+--------------+--------------------+--------------------+------------------+-------------------+-----------------+-----------------+--------------+---------------+--------------+
+| 0 |
s3:/.../table/data/00000-3-8d6d60e8-d427-4809-bcf0-f5d45a4aad96.parquet |
PARQUET | 1 | 597 | [1 -> 90, 2 -> 62] | [1 -> 1,
2 -> 1] | [1 -> 0, 2 -> 0] | [1 -> , 2 -> c] | [1 -> , 2 -> c] | null
| [4] | null |
+| 0 |
s3:/.../table/data/00001-4-8d6d60e8-d427-4809-bcf0-f5d45a4aad96.parquet |
PARQUET | 1 | 597 | [1 -> 90, 2 -> 62] | [1 -> 1,
2 -> 1] | [1 -> 0, 2 -> 0] | [1 -> , 2 -> b] | [1 -> , 2 -> b] | null
| [4] | null |
+| 0 |
s3:/.../table/data/00002-5-8d6d60e8-d427-4809-bcf0-f5d45a4aad96.parquet |
PARQUET | 1 | 597 | [1 -> 90, 2 -> 62] | [1 -> 1,
2 -> 1] | [1 -> 0, 2 -> 0] | [1 -> , 2 -> a] | [1 -> , 2 -> a] | null
| [4] | null |
++-----------------------------------------------------------------------------------+-------------+--------------+--------------------+--------------------+------------------+-------------------+-----------------+-----------------+--------------+---------------+--------------+
+```
Review comment:
Seems like the files table only returns data files and not delete files,
so content will always be `0`. We can remove this field. Same with
`equalityIds`, this will always be null since the files table only covers data
files.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]