openinx commented on a change in pull request #3376:
URL: https://github.com/apache/iceberg/pull/3376#discussion_r808627331



##########
File path: build.gradle
##########
@@ -587,6 +587,25 @@ project(':iceberg-nessie') {
   }
 }
 
+project(':iceberg-dell') {
+  dependencies {
+    implementation project(':iceberg-core')
+    implementation project(':iceberg-common')
+    implementation project(path: ':iceberg-bundled-guava', configuration: 
'shadow')
+    compileOnly 'com.emc.ecs:object-client-bundle'
+
+    testImplementation project(path: ':iceberg-api', configuration: 
'testArtifacts')
+    testImplementation("org.apache.hadoop:hadoop-common") {
+      exclude group: 'org.apache.avro', module: 'avro'
+      exclude group: 'org.slf4j', module: 'slf4j-log4j12'
+    }
+    testImplementation 'com.emc.ecs:object-client-bundle'

Review comment:
       As we've already added this dependency ino `compileOnly` list, then I 
think we don't have to add this line  for test now ? 

##########
File path: dell/src/main/java/org/apache/iceberg/dell/DellProperties.java
##########
@@ -0,0 +1,84 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.dell;
+
+import java.io.Serializable;
+import java.util.Map;
+
+public class DellProperties implements Serializable {
+  /**
+   * S3 Access key id of Dell EMC ECS
+   */
+  public static final String ECS_S3_ACCESS_KEY_ID = "ecs.s3.access-key-id";
+
+  /**
+   * S3 Secret access key of Dell EMC ECS
+   */
+  public static final String ECS_S3_SECRET_ACCESS_KEY = 
"ecs.s3.secret-access-key";

Review comment:
       In dell ecs, we name the secret key as `secret-access-key` ? In our 
alibaba oss, we name it as `access-key-secret`. 
https://www.alibabacloud.com/help/en/doc-detail/29009.htm 
   
   Well,  I see the aws also name it as `secret-access-key`, it is okay to 
follow it for dell ecs. 
https://docs.aws.amazon.com/general/latest/gr/aws-sec-cred-types.html

##########
File path: dell/src/main/java/org/apache/iceberg/dell/DellProperties.java
##########
@@ -0,0 +1,84 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.dell;
+
+import java.io.Serializable;
+import java.util.Map;
+
+public class DellProperties implements Serializable {
+  /**
+   * S3 Access key id of Dell EMC ECS
+   */
+  public static final String ECS_S3_ACCESS_KEY_ID = "ecs.s3.access-key-id";
+
+  /**
+   * S3 Secret access key of Dell EMC ECS
+   */
+  public static final String ECS_S3_SECRET_ACCESS_KEY = 
"ecs.s3.secret-access-key";
+
+  /**
+   * S3 endpoint of Dell EMC ECS
+   */
+  public static final String ECS_S3_ENDPOINT = "ecs.s3.endpoint";
+
+  /**
+   * The implementation class of {@link DellClientFactory} to customize Dell 
client configurations.
+   * If set, all Dell clients will be initialized by the specified factory.
+   * If not set, {@link DellClientFactories.DefaultDellClientFactory} is used 
as default factory.
+   */
+  public static final String CLIENT_FACTORY = "client.factory";
+
+  private String ecsS3Endpoint;
+  private String ecsS3AccessKeyId;
+  private String ecsS3SecretAccessKey;
+
+  public DellProperties() {
+  }
+
+  public DellProperties(Map<String, String> properties) {
+    this.ecsS3AccessKeyId = 
properties.get(DellProperties.ECS_S3_ACCESS_KEY_ID);
+    this.ecsS3SecretAccessKey = 
properties.get(DellProperties.ECS_S3_SECRET_ACCESS_KEY);
+    this.ecsS3Endpoint = properties.get(DellProperties.ECS_S3_ENDPOINT);
+  }
+
+  public String ecsS3Endpoint() {
+    return ecsS3Endpoint;
+  }
+
+  public void setEcsS3Endpoint(String ecsS3Endpoint) {

Review comment:
       Do those set method reference by others ? Seems like we can just remove 
them .

##########
File path: 
dell/src/main/java/org/apache/iceberg/dell/ecs/EcsAppendOutputStream.java
##########
@@ -0,0 +1,149 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.dell.ecs;
+
+import com.emc.object.s3.S3Client;
+import com.emc.object.s3.request.PutObjectRequest;
+import java.io.ByteArrayInputStream;
+import java.nio.ByteBuffer;
+import org.apache.iceberg.io.PositionOutputStream;
+
+/**
+ * Use ECS append API to write data.
+ */
+class EcsAppendOutputStream extends PositionOutputStream {
+
+  private final S3Client client;
+
+  private final EcsURI key;
+
+  /**
+   * Local bytes cache that avoid too many requests
+   * <p>
+   * Use {@link ByteBuffer} to maintain offset.
+   */
+  private final ByteBuffer localCache;
+
+  /**
+   * A marker for data file to put first part instead of append first part.
+   */
+  private boolean firstPart = true;
+
+  /**
+   * Pos for {@link PositionOutputStream}
+   */
+  private long pos;
+
+  private EcsAppendOutputStream(S3Client client, EcsURI key, byte[] 
localCache) {

Review comment:
       Nit: I'd prefer to change the name `key` to `uri`.

##########
File path: dell/src/main/java/org/apache/iceberg/dell/ecs/EcsURI.java
##########
@@ -0,0 +1,104 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.dell.ecs;
+
+import java.net.URI;
+import java.util.Objects;
+import java.util.Set;
+import org.apache.iceberg.exceptions.ValidationException;
+import org.apache.iceberg.relocated.com.google.common.base.Preconditions;
+import org.apache.iceberg.relocated.com.google.common.collect.ImmutableSet;
+
+/**
+ * An immutable record class of ECS location
+ */
+class EcsURI {
+
+  private static final Set<String> VALID_SCHEME = ImmutableSet.of("ecs", "s3", 
"s3a", "s3n");
+
+  private final String location;
+  private final String bucket;
+  private final String name;
+
+  EcsURI(String location) {
+    Preconditions.checkNotNull(location == null, "Location %s can not be 
null", location);
+
+    this.location = location;
+
+    URI uri = URI.create(location);
+    ValidationException.check(
+        VALID_SCHEME.contains(uri.getScheme().toLowerCase()),
+        "Invalid ecs location: %s",
+        location);
+    this.bucket = uri.getHost();
+    this.name = uri.getPath().replaceAll("^/*", "");
+  }
+
+  EcsURI(String bucket, String name) {
+    this.location = String.format("ecs://%s/%s", bucket, name);
+    this.bucket = bucket;
+    this.name = name;
+  }
+
+  /**
+   * Returns ECS bucket name.
+   */
+  public String bucket() {
+    return bucket;
+  }
+
+  /**
+   * Returns ECS object name.
+   */
+  public String name() {
+    return name;
+  }
+
+  /**
+   * Returns original location.
+   */
+  public String location() {
+    return location;
+  }
+
+  @Override
+  public String toString() {
+    return location;
+  }
+
+  @Override
+  public boolean equals(Object o) {
+    if (this == o) {
+      return true;
+    }
+    if (o == null || getClass() != o.getClass()) {
+      return false;
+    }
+    EcsURI ecsURI = (EcsURI) o;
+    return Objects.equals(location, ecsURI.location) &&
+        Objects.equals(bucket, ecsURI.bucket) &&
+        Objects.equals(name, ecsURI.name);
+  }
+
+  @Override
+  public int hashCode() {
+    return Objects.hash(location, bucket, name);

Review comment:
       Ditto.

##########
File path: 
dell/src/main/java/org/apache/iceberg/dell/ecs/EcsAppendOutputStream.java
##########
@@ -0,0 +1,149 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.dell.ecs;
+
+import com.emc.object.s3.S3Client;
+import com.emc.object.s3.request.PutObjectRequest;
+import java.io.ByteArrayInputStream;
+import java.nio.ByteBuffer;
+import org.apache.iceberg.io.PositionOutputStream;
+
+/**
+ * Use ECS append API to write data.
+ */
+class EcsAppendOutputStream extends PositionOutputStream {
+
+  private final S3Client client;
+
+  private final EcsURI key;
+
+  /**
+   * Local bytes cache that avoid too many requests
+   * <p>
+   * Use {@link ByteBuffer} to maintain offset.
+   */
+  private final ByteBuffer localCache;
+
+  /**
+   * A marker for data file to put first part instead of append first part.
+   */
+  private boolean firstPart = true;
+
+  /**
+   * Pos for {@link PositionOutputStream}
+   */
+  private long pos;
+
+  private EcsAppendOutputStream(S3Client client, EcsURI key, byte[] 
localCache) {
+    this.client = client;
+    this.key = key;
+    this.localCache = ByteBuffer.wrap(localCache);
+  }
+
+  /**
+   * Use built-in 1 KiB byte buffer
+   */
+  static EcsAppendOutputStream create(S3Client client, EcsURI uri) {
+    return createWithBufferSize(client, uri, 1024);
+  }
+
+  /**
+   * Create {@link PositionOutputStream} with specific buffer size.
+   */
+  static EcsAppendOutputStream createWithBufferSize(S3Client client, EcsURI 
uri, int size) {
+    return new EcsAppendOutputStream(client, uri, new byte[size]);
+  }
+
+  /**
+   * Write a byte. If buffer is full, upload the buffer.
+   */
+  @Override
+  public void write(int b) {
+    if (!checkBuffer(1)) {
+      flush();
+    }
+
+    localCache.put((byte) b);
+    pos += 1;
+  }
+
+  /**
+   * Write a byte.
+   * If buffer is full, upload the buffer.
+   * If buffer size &lt; input bytes, upload input bytes.
+   */
+  @Override
+  public void write(byte[] b, int off, int len) {
+    if (!checkBuffer(len)) {
+      flush();
+    }
+
+    if (checkBuffer(len)) {
+      localCache.put(b, off, len);
+    } else {
+      // if content > cache, directly flush itself.
+      flushBuffer(b, off, len);
+    }
+
+    pos += len;
+  }
+
+  private boolean checkBuffer(int nextWrite) {
+    return localCache.remaining() >= nextWrite;
+  }
+
+  private void flushBuffer(byte[] buffer, int offset, int length) {
+    if (firstPart) {
+      client.putObject(new PutObjectRequest(key.bucket(), key.name(),
+          new ByteArrayInputStream(buffer, offset, length)));
+      firstPart = false;
+    } else {
+      client.appendObject(key.bucket(), key.name(), new 
ByteArrayInputStream(buffer, offset, length));

Review comment:
       So seems the dell ecs provide the api `appendObject`, which means append 
few bytes into the closed object, right ?   But I just curious why we don't 
append the bytes into the opening outputstream for the non-first parts , while 
this PR prefer to append the second part into the reopened outpustream ? 

##########
File path: dell/src/main/java/org/apache/iceberg/dell/ecs/EcsURI.java
##########
@@ -0,0 +1,104 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.dell.ecs;
+
+import java.net.URI;
+import java.util.Objects;
+import java.util.Set;
+import org.apache.iceberg.exceptions.ValidationException;
+import org.apache.iceberg.relocated.com.google.common.base.Preconditions;
+import org.apache.iceberg.relocated.com.google.common.collect.ImmutableSet;
+
+/**
+ * An immutable record class of ECS location
+ */
+class EcsURI {
+
+  private static final Set<String> VALID_SCHEME = ImmutableSet.of("ecs", "s3", 
"s3a", "s3n");
+
+  private final String location;
+  private final String bucket;
+  private final String name;
+
+  EcsURI(String location) {
+    Preconditions.checkNotNull(location == null, "Location %s can not be 
null", location);
+
+    this.location = location;
+
+    URI uri = URI.create(location);
+    ValidationException.check(
+        VALID_SCHEME.contains(uri.getScheme().toLowerCase()),
+        "Invalid ecs location: %s",
+        location);
+    this.bucket = uri.getHost();
+    this.name = uri.getPath().replaceAll("^/*", "");
+  }
+
+  EcsURI(String bucket, String name) {
+    this.location = String.format("ecs://%s/%s", bucket, name);
+    this.bucket = bucket;
+    this.name = name;
+  }
+
+  /**
+   * Returns ECS bucket name.
+   */
+  public String bucket() {
+    return bucket;
+  }
+
+  /**
+   * Returns ECS object name.
+   */
+  public String name() {
+    return name;
+  }
+
+  /**
+   * Returns original location.
+   */
+  public String location() {
+    return location;
+  }
+
+  @Override
+  public String toString() {
+    return location;
+  }
+
+  @Override
+  public boolean equals(Object o) {
+    if (this == o) {
+      return true;
+    }
+    if (o == null || getClass() != o.getClass()) {
+      return false;
+    }
+    EcsURI ecsURI = (EcsURI) o;
+    return Objects.equals(location, ecsURI.location) &&

Review comment:
       For the two uri `ecs://bucket/key` , `s3://bucket/key`,  should them be 
thought as the same EcsURI ?  If sure, then I think we shouldn't add the 
location equality checker in this line. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to