[GitHub] [iceberg] szehon-ho commented on a diff in pull request #4709: Docs: Make it clear metadata tables support time travel in Spark

GitBox Sat, 24 Sep 2022 14:17:12 -0700


szehon-ho commented on code in PR #4709:
URL: https://github.com/apache/iceberg/pull/4709#discussion_r979317742



##########
docs/spark-queries.md:
##########
@@ -394,3 +394,22 @@ 
spark.read.format("iceberg").load("db.table.files").show(truncate = false)
 // Hadoop path table
 
spark.read.format("iceberg").load("hdfs://nn:8020/path/to/table#files").show(truncate
 = false)

Review Comment:
   Sorry its unrelated, but could you also remove the show (truncate=true) 
here?  Should have added the comment in the previous pr.  This doc should just 
explain how to load the dataframe, not necessarily how to show it.



##########
docs/spark-queries.md:
##########
@@ -394,3 +394,22 @@ 
spark.read.format("iceberg").load("db.table.files").show(truncate = false)
 // Hadoop path table
 
spark.read.format("iceberg").load("hdfs://nn:8020/path/to/table#files").show(truncate
 = false)
 ```
+
+### Time Travel with Metadata Tables
+
+To inspect a tables's metadata with the time travel feature:
+
+```sql
+-- get the table's file manifests at timestamp Sep 20, 2021 08:00:00
+SELECT * FROM prod.db.table.manifests TIMESTAMP AS OF '2021-09-20 08:00:00';
+
+-- get the table's partitions with snapshot id 10963874102873L
+SELECT * FROM prod.db.table.partitions VERSION AS OF 10963874102873;
+```
+
+Metadata tables can also be inspected with time travel using the 
DataFrameReader API:
+
+```scala
+// get table's data files and each data file's metadata at snapshot-id 
10963874102873

Review Comment:
   What do you think simplifying it, // Load table's file metadata at 
snapshot-id... as dataframe
   
   (Also files table may be delete files as well, hence removing 'data')



##########
docs/spark-queries.md:
##########
@@ -394,3 +394,22 @@ 
spark.read.format("iceberg").load("db.table.files").show(truncate = false)
 // Hadoop path table
 
spark.read.format("iceberg").load("hdfs://nn:8020/path/to/table#files").show(truncate
 = false)
 ```
+
+### Time Travel with Metadata Tables
+
+To inspect a tables's metadata with the time travel feature:
+
+```sql
+-- get the table's file manifests at timestamp Sep 20, 2021 08:00:00
+SELECT * FROM prod.db.table.manifests TIMESTAMP AS OF '2021-09-20 08:00:00';

Review Comment:
   Nit: can we remove the end semicolon for consistency?



##########
docs/spark-queries.md:
##########
@@ -394,3 +394,22 @@ 
spark.read.format("iceberg").load("db.table.files").show(truncate = false)
 // Hadoop path table
 
spark.read.format("iceberg").load("hdfs://nn:8020/path/to/table#files").show(truncate
 = false)
 ```
+
+### Time Travel with Metadata Tables
+
+To inspect a tables's metadata with the time travel feature:
+
+```sql
+-- get the table's file manifests at timestamp Sep 20, 2021 08:00:00
+SELECT * FROM prod.db.table.manifests TIMESTAMP AS OF '2021-09-20 08:00:00';
+
+-- get the table's partitions with snapshot id 10963874102873L
+SELECT * FROM prod.db.table.partitions VERSION AS OF 10963874102873;
+```
+
+Metadata tables can also be inspected with time travel using the 
DataFrameReader API:
+
+```scala
+// get table's data files and each data file's metadata at snapshot-id 
10963874102873
+spark.read.format("iceberg").option("snapshot-id", 
10963874102873L).load("db.table.files").show()

Review Comment:
   Can we remove show()?  I know the other one had it, but just realized its a 
bit unnecessary (this doc should just explain how to load the metadata table as 
dataframe, just like the previous section explain how to load the data table as 
dataframe, not necessarily how to show it)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] szehon-ho commented on a diff in pull request #4709: Docs: Make it clear metadata tables support time travel in Spark

Reply via email to