rdblue commented on a change in pull request #1871:
URL: https://github.com/apache/iceberg/pull/1871#discussion_r535533762
##########
File path: site/docs/configuration.md
##########
@@ -96,6 +96,13 @@ The following properties from the Hadoop configuration are
used by the Hive Meta
| iceberg.hive.client-pool-size | 5 | The size of the Hive
client pool when tracking tables in HMS |
| iceberg.hive.lock-timeout-ms | 180000 (3 min) | Maximum time in
milliseconds to acquire a lock |
+The following properties from the Hadoop configuration are used by Hadoop
Tables
+
+| Property | Default | Description
|
+| --------------------------------------------- | -------- |
---------------------------------------------------------------------- |
+| iceberg.engine.hadoop.`SCHEME`.atomic.write | false | Controls whether
atomic write will be used instead of atomic rename. `SCHEME` is the scheme
used, e.g. `{hdfs, cos}` |
Review comment:
I'm not sure that this is a good way to configure this feature. We avoid
using Hadoop `Configuration` because we don't want to be dependent on it for
configuration other than when we use Hadoop components.
Also, I don't think that users should primarily need to add these
configurations. If we know that a FS has some feature, then we should default
to using that feature for the FS. A good example is locality in HDFS. We detect
HDFS URIs and get locality for that file system, but skip it for S3.
I would expect this to do the same, where the `HadoopTableOperations`
implementation detects certain file systems and chooses the right atomic
operation for them, based on the table location. We can also add configuration
to set that list of file systems that is passed in when the
`HadoopTableOperations` is created.
Configuring the `TableOperations` is primarily how we want to customize
because we have good ways of injecting your own table operations or using
existing table operations.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]