[GitHub] [iceberg] rdblue commented on a change in pull request #1871: Add Atomic Write support for Hadoop Tables

GitBox Thu, 03 Dec 2020 11:42:55 -0800


rdblue commented on a change in pull request #1871:
URL: https://github.com/apache/iceberg/pull/1871#discussion_r535533762




##########
File path: site/docs/configuration.md
##########
@@ -96,6 +96,13 @@ The following properties from the Hadoop configuration are 
used by the Hive Meta
 | iceberg.hive.client-pool-size      | 5                | The size of the Hive 
client pool when tracking tables in HMS  |
 | iceberg.hive.lock-timeout-ms       | 180000 (3 min)   | Maximum time in 
milliseconds to acquire a lock                |
 
+The following properties from the Hadoop configuration are used by Hadoop 
Tables
+
+| Property                                      | Default  | Description       
                                                     |
+| --------------------------------------------- | -------- | 
---------------------------------------------------------------------- |
+| iceberg.engine.hadoop.`SCHEME`.atomic.write   | false    | Controls whether 
atomic write will be used instead of atomic rename. `SCHEME` is the scheme 
used, e.g. `{hdfs, cos}`    |

Review comment:
       I'm not sure that this is a good way to configure this feature. We avoid 
using Hadoop `Configuration` because we don't want to be dependent on it for 
configuration other than when we use Hadoop components.
   
   Also, I don't think that users should primarily need to add these 
configurations. If we know that a FS has some feature, then we should default 
to using that feature for the FS. A good example is locality in HDFS. We detect 
HDFS URIs and get locality for that file system, but skip it for S3.
   
   I would expect this to do the same, where the `HadoopTableOperations` 
implementation detects certain file systems and chooses the right atomic 
operation for them, based on the table location. We can also add configuration 
to set that list of file systems that is passed in when the 
`HadoopTableOperations` is created.
   
   Configuring the `TableOperations` is primarily how we want to customize 
because we have good ways of injecting your own table operations or using 
existing table operations.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] rdblue commented on a change in pull request #1871: Add Atomic Write support for Hadoop Tables

Reply via email to