tomtongue commented on code in PR #8324:
URL: https://github.com/apache/iceberg/pull/8324#discussion_r1295794660
##########
docs/spark-procedures.md:
##########
@@ -277,6 +277,10 @@ Used to remove files which are not referenced in any
metadata files of an Iceber
| `dry_run` | | boolean | When true, don't actually remove files
(defaults to false) |
| `max_concurrent_deletes` | | int | Size of the thread pool used for
delete file actions (by default, no thread pool is used) |
+{{< hint warning >}}
+The timestamp within 24 hours cannot be set to `older_than`. For testing
`remove_orphan_files`, configure `spark.testing` to true in the SparkSession
object.
Review Comment:
Thanks for the kind review and suggestion.
24 hours limitation would help users because I saw there are users who found
this limitation after running `remove_orphan_files` with the interval within 24
hours. Let me add more context to this.
Regarding the `spark.testing` option, your concern makes sense, and I am
also concerned the same.
The problem here I believe is that there are users who want to set
`older_than` within 24 hours to run the `remove_orphan_files` tests or to check
its behaviour. It's important to correctly guide Iceberg users for
the`spark.testing` usage, at the same time, I think the `spark.testing` would
help them. For example, it's possible to say:
*For testing purpose or checking the behaviour, not in production, you can
set the interval within 24 hours by configuring `spark.testing` to true in
SparkSession. If you set the interval within 24 hours, recommend setting
`dry_run` option to true together.*
What do you think about my comment on adding `spark.testing` to the doc, and
the phrase above?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]