alamb commented on code in PR #4918:
URL: https://github.com/apache/arrow-rs/pull/4918#discussion_r1361180987
##########
object_store/src/aws/client.rs:
##########
@@ -200,15 +198,14 @@ impl From<DeleteError> for Error {
#[derive(Debug)]
pub struct S3Config {
pub region: String,
- pub endpoint: String,
+ pub endpoint: Option<String>,
Review Comment:
I double checked that this is not a (breaking) API change, as it changes the
type of a pub field because the module is not pub:
https://github.com/apache/arrow-rs/blob/95b015cf7b5d57c7fe66a8feada4f48a987cb020/object_store/src/aws/mod.rs#L66
https://docs.rs/object_store/latest/object_store/struct.ObjectMeta.html?search=S3Config#structfield.e_tag
##########
object_store/src/aws/copy.rs:
##########
@@ -39,12 +39,64 @@ pub enum S3CopyIfNotExists {
///
/// [`ObjectStore::copy_if_not_exists`]:
crate::ObjectStore::copy_if_not_exists
Header(String, String),
+ /// The name of a DynamoDB table to use for coordination
+ ///
+ /// Encoded as `dynamodb:<TABLE_NAME>` ignoring whitespace
+ ///
+ /// This will use the same region, credentials and endpoint as configured
for S3
+ ///
+ /// ## Limitations
+ ///
+ /// Only conditional operations, e.g. `copy_if_not_exists` will be
synchronized, and can
+ /// therefore race with non-conditional operations, e.g. `put`, `copy`, or
conditional
+ /// operations performed by writers not configured to synchronize with
DynamoDB.
+ ///
+ /// Workloads making use of this mechanism **must** ensure:
+ ///
+ /// * Conditional and non-conditional operations are not performed on the
same paths
+ /// * Conditional operations are only performed via similarly configured
clients
+ ///
+ /// Additionally as the locking mechanism relies on timeouts to detect
stale locks,
+ /// performance will be poor for systems that frequently rewrite the same
path, instead
+ /// being optimised for systems that primarily create files with paths
never used before.
+ ///
+ /// ## Locking Protocol
+ ///
+ /// The DynamoDB schema is as follows:
+ ///
+ /// * A string hash key named `"key"`
+ /// * A numeric [TTL] attribute named `"ttl"`
+ /// * A numeric attribute named `"generation"`
+ ///
+ /// The lock procedure is as follows:
+ ///
+ /// * Error if file exists in S3
+ /// * Create a corresponding record in DynamoDB with the path as the
`"key"`
+ /// * On Success: Create object in S3
+ /// * On Conflict:
+ /// * Periodically check if file exists in S3
+ /// * After a 60 second timeout attempt to "claim" the lock by
incrementing `"generation"`
+ /// * GOTO start
+ ///
+ /// This is inspired by the [DynamoDB Lock Client] but simplified for the
more limited
+ /// requirements of synchronizing object storage.
+ ///
+ /// The major changes are:
+ ///
+ /// * Uses a monotonic generation count instead of a UUID rvn
+ /// * Relies on [TTL] to eventually clean up old locks
+ /// * Uses a hard-coded lease duration of 20 seconds
+ ///
+ /// [TTL]:
https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/howitworks-ttl.html
+ /// [DynamoDB Lock Client]:
https://aws.amazon.com/blogs/database/building-distributed-locks-with-the-dynamodb-lock-client/
+ Dynamo(String),
Review Comment:
I recommend also adding in the parameters here for TTL, and LEASE_EXPIRY
rather than hard coding it or having to make backwards incompatible changes in
the future
##########
object_store/src/aws/copy.rs:
##########
@@ -39,12 +39,64 @@ pub enum S3CopyIfNotExists {
///
/// [`ObjectStore::copy_if_not_exists`]:
crate::ObjectStore::copy_if_not_exists
Header(String, String),
+ /// The name of a DynamoDB table to use for coordination
+ ///
+ /// Encoded as `dynamodb:<TABLE_NAME>` ignoring whitespace
+ ///
+ /// This will use the same region, credentials and endpoint as configured
for S3
+ ///
+ /// ## Limitations
+ ///
+ /// Only conditional operations, e.g. `copy_if_not_exists` will be
synchronized, and can
+ /// therefore race with non-conditional operations, e.g. `put`, `copy`, or
conditional
+ /// operations performed by writers not configured to synchronize with
DynamoDB.
+ ///
+ /// Workloads making use of this mechanism **must** ensure:
+ ///
+ /// * Conditional and non-conditional operations are not performed on the
same paths
+ /// * Conditional operations are only performed via similarly configured
clients
+ ///
+ /// Additionally as the locking mechanism relies on timeouts to detect
stale locks,
+ /// performance will be poor for systems that frequently rewrite the same
path, instead
+ /// being optimised for systems that primarily create files with paths
never used before.
+ ///
+ /// ## Locking Protocol
+ ///
+ /// The DynamoDB schema is as follows:
+ ///
+ /// * A string hash key named `"key"`
+ /// * A numeric [TTL] attribute named `"ttl"`
+ /// * A numeric attribute named `"generation"`
+ ///
+ /// The lock procedure is as follows:
+ ///
+ /// * Error if file exists in S3
+ /// * Create a corresponding record in DynamoDB with the path as the
`"key"`
+ /// * On Success: Create object in S3
+ /// * On Conflict:
+ /// * Periodically check if file exists in S3
+ /// * After a 60 second timeout attempt to "claim" the lock by
incrementing `"generation"`
+ /// * GOTO start
+ ///
+ /// This is inspired by the [DynamoDB Lock Client] but simplified for the
more limited
+ /// requirements of synchronizing object storage.
+ ///
+ /// The major changes are:
+ ///
+ /// * Uses a monotonic generation count instead of a UUID rvn
Review Comment:
A monotonic generation can have collisions between multiple client writers
who are claiming expired locks, right? Why not use the UUID?
##########
object_store/src/aws/dynamo.rs:
##########
@@ -0,0 +1,333 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements. See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership. The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License. You may obtain a copy of the License at
+//
+// http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied. See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+//! A DynamoDB based lock system
+
+use crate::aws::client::S3Client;
+use crate::aws::credential::CredentialExt;
+use crate::client::get::GetClientExt;
+use crate::client::retry::Error as RetryError;
+use crate::client::retry::RetryExt;
+use crate::path::Path;
+use crate::{Error, Result};
+use chrono::Utc;
+use reqwest::StatusCode;
+use serde::ser::SerializeMap;
+use serde::{Deserialize, Serialize, Serializer};
+use std::collections::HashMap;
+use std::time::{Duration, Instant};
+
+/// The timeout for a lease operation
+const LEASE_TIMEOUT: Duration = Duration::from_secs(20);
+
+/// The length of time a lease is valid for
+///
+/// This should be a multiple of [`LEASE_TIMEOUT`] where the multiple
determines the maximum
+/// clock skew rate tolerated by the system
+const LEASE_EXPIRY: Duration = Duration::from_secs(60);
+
+/// The TTL offset to encode in DynamoDB
+///
+/// This should be significantly larger than [`LEASE_EXPIRY`] to allow for
clock skew
+const LEASE_TTL: Duration = Duration::from_secs(60 * 60);
Review Comment:
What is the rationale for such long timeouts?
The lease time to live is an hour which seems like a long time, especially
given the lease expiry is for 60 seconds but the actual operation being
performed (copy) likely takes much less.
I strongly suggest these values can be specified via configuration values
rather than hard coded
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]