[GitHub] [iceberg] RussellSpitzer commented on a change in pull request #4255: Docs: Update section on inspecting tables

GitBox Thu, 03 Mar 2022 10:14:06 -0800


RussellSpitzer commented on a change in pull request #4255:
URL: https://github.com/apache/iceberg/pull/4255#discussion_r818934772




##########
File path: docs/versioned/spark/spark-queries.md
##########
@@ -168,7 +168,7 @@ To inspect a table's history, snapshots, and other 
metadata, Iceberg supports me
 Metadata tables are identified by adding the metadata table name after the 
original table name. For example, history for `db.table` is read using 
`db.table.history`.
 
 {{< hint info >}}
-As of Spark 3.0, the format of the table name for inspection 
(`catalog.database.table.metadata`) doesn't work with Spark's default catalog 
(`spark_catalog`). If you've replaced the default catalog, you may want to use 
`DataFrameReader` API to inspect the table. 
+In Spark 3.0 and 3.1, if you have replaced Spark's default catalog 
(`spark_catalog`) with Iceberg's `SparkSessionCatalog`, you cannot use it to 
query metadata tables, as the form of the metadata table name 
(`catalog.database.table.metadata`) is not accepted by Spark. This is fixed in 
Spark 3.2 by [SPARK-34209](https://issues.apache.org/jira/browse/SPARK-34209). 
For Spark 3.0 and 3.1, you can configure a different catalog (implemented by 
`SparkCatalog`) to query metadata tables, or you can use the `DataFrameReader` 
API.

Review comment:
       I would just make the wording a bit more specific here. There are a lot 
of pronouns which I think is a bit difficult to reason about. I also don't 
think we need the Jira link since that would mostly help dev's and note users, 
but feel free to keep it if you think it adds value.
   
   In Spark 3.0 and 3.1 the Spark Session catalog (`spark_catalog`) does not 
support table names with multipart identifiers such as 
`catalog.database.table.metadata`. To work around the lack of support of 
multipart identifiers, configure a non-session catalog using the Iceberg 
SparkCatalog class or use the Spark `DataFrameReader` API. Spark 3.2 and 
onwards supports multipart identifiers in the Spark Session catalog.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] RussellSpitzer commented on a change in pull request #4255: Docs: Update section on inspecting tables

Reply via email to