rdblue commented on a change in pull request #1221:
URL: https://github.com/apache/iceberg/pull/1221#discussion_r458276824
##########
File path: site/docs/configuration.md
##########
@@ -109,14 +110,14 @@ spark.read
.table("catalog.db.table")
```
-| Spark option | Default | Description
|
-| --------------- | --------------------- |
-----------------------------------------------------------------------------------------
|
-| snapshot-id | (latest) | Snapshot ID of the table snapshot
to read |
-| as-of-timestamp | (latest) | A timestamp in milliseconds; the
snapshot used will be the snapshot current at this time. |
-| split-size | As per table property | Overrides this table's
read.split.target-size and read.split.metadata-target-size |
-| lookback | As per table property | Overrides this table's
read.split.planning-lookback |
-| file-open-cost | As per table property | Overrides this table's
read.split.open-file-cost |
-
+| Spark option | Default | Description
|
+| -------------------------- | --------------------- |
-----------------------------------------------------------------------------------------
|
+| snapshot-id | (latest) | Snapshot ID of the
table snapshot to read |
+| as-of-timestamp | (latest) | A timestamp in
milliseconds; the snapshot used will be the snapshot current at this time. |
+| split-size | As per table property | Overrides this table's
read.split.target-size and read.split.metadata-target-size |
+| lookback | As per table property | Overrides this table's
read.split.planning-lookback |
+| file-open-cost | As per table property | Overrides this table's
read.split.open-file-cost |
+| use-approximate-statistics | As per table property | Overrides this table's
read.spark.read.spark.use-approximate-statistics |
Review comment:
I'm not sure I understand the case you're talking about. What I'm saying
is that because we are using reliable stats that are maintained in the table
metadata, we can always use them if there are no filters. That takes care of
the bad case in Spark 2.4. Since Spark 2.4 doesn't ever push filters before
calling `estimateStatistics`, it will always use metadata stats and will avoid
the issue entirely.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]