[GitHub] [iceberg] szehon-ho commented on a change in pull request #4255: Docs: Update section on inspecting tables

GitBox Tue, 15 Mar 2022 14:25:37 -0700


szehon-ho commented on a change in pull request #4255:
URL: https://github.com/apache/iceberg/pull/4255#discussion_r827424145




##########
File path: docs/spark/spark-queries.md
##########
@@ -168,7 +168,9 @@ To inspect a table's history, snapshots, and other 
metadata, Iceberg supports me
 Metadata tables are identified by adding the metadata table name after the 
original table name. For example, history for `db.table` is read using 
`db.table.history`.
 
 {{< hint info >}}
-As of Spark 3.0, the format of the table name for inspection 
(`catalog.database.table.metadata`) doesn't work with Spark's default catalog 
(`spark_catalog`). If you've replaced the default catalog, you may want to use 
`DataFrameReader` API to inspect the table. 
+For Spark 2.4, use the `DataFrameReader` API to [inspect 
tables](#inspecting-with-dataframes).
+
+For Spark 3, prior to 3.2, the Spark session catalog (`spark_catalog`) does 
not support table names with multipart identifiers such as 
`catalog.database.table.metadata`. To work around this, for querying metadata 
tables, configure a different catalog that uses the Iceberg `SparkCatalog` 
class, or use the Spark `DataFrameReader` API. From Spark 3.2 onwards, the 
session catalog supports table names with multipart identifiers.

Review comment:
       Do we need 'spark_catalog' quote here?   It seems a bit confusing, as we 
are talking about Spark Session Catalog.  It was probably from before, when 
mentioning the default catalog.  If not needed I feel we can remove it.  

##########
File path: docs/spark/spark-queries.md
##########
@@ -168,7 +168,9 @@ To inspect a table's history, snapshots, and other 
metadata, Iceberg supports me
 Metadata tables are identified by adding the metadata table name after the 
original table name. For example, history for `db.table` is read using 
`db.table.history`.
 
 {{< hint info >}}
-As of Spark 3.0, the format of the table name for inspection 
(`catalog.database.table.metadata`) doesn't work with Spark's default catalog 
(`spark_catalog`). If you've replaced the default catalog, you may want to use 
`DataFrameReader` API to inspect the table. 
+For Spark 2.4, use the `DataFrameReader` API to [inspect 
tables](#inspecting-with-dataframes).
+
+For Spark 3, prior to 3.2, the Spark session catalog (`spark_catalog`) does 
not support table names with multipart identifiers such as 
`catalog.database.table.metadata`. To work around this, for querying metadata 
tables, configure a different catalog that uses the Iceberg `SparkCatalog` 
class, or use the Spark `DataFrameReader` API. From Spark 3.2 onwards, the 
session catalog supports table names with multipart identifiers.

Review comment:
       Also I feel like it is still a bit wordy, can we just say
   
   As a workaround, configure the catalog 
`org.apache.iceberg.spark.SparkCatalog` or use the DataFrameReader API.
   
   It's more consistent with the previous sentence.  I was thinking we dont 
need 'for querying metadata tables', as I think that's what this section is 
about.   And I feel, no need to mention From Spark 3.2 onwards, as it's implied 
the issue is gone after 3.2.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] szehon-ho commented on a change in pull request #4255: Docs: Update section on inspecting tables

Reply via email to