[GitHub] [hadoop] liuml07 commented on a change in pull request #2530: HADOOP-17414. Magic committer files don't have the count of bytes written collected by spark

GitBox Sun, 10 Jan 2021 23:31:16 -0800


liuml07 commented on a change in pull request #2530:
URL: https://github.com/apache/hadoop/pull/2530#discussion_r554779575




##########
File path: 
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/Constants.java
##########
@@ -1048,4 +1048,10 @@ private Constants() {
   public static final String STORE_CAPABILITY_DIRECTORY_MARKER_ACTION_DELETE
       = "fs.s3a.capability.directory.marker.action.delete";
 
+  /**
+   * To comply with the XAttr rules, all headers of the object retrieved
+   * through the getXAttr APIs have the prefix: {@value}.
+   */
+  public static final String HEADER_PREFIX = "header.";

Review comment:
       nit: is `XA_HEADER_PREFIX` a bit clearer name?

##########
File path: 
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java
##########
@@ -4103,56 +4111,7 @@ private ObjectMetadata 
cloneObjectMetadata(ObjectMetadata source) {
     // in future there are new attributes added to ObjectMetadata
     // that we do not explicitly call to set here
     ObjectMetadata ret = newObjectMetadata(source.getContentLength());
-
-    // Possibly null attributes
-    // Allowing nulls to pass breaks it during later use
-    if (source.getCacheControl() != null) {
-      ret.setCacheControl(source.getCacheControl());
-    }
-    if (source.getContentDisposition() != null) {
-      ret.setContentDisposition(source.getContentDisposition());
-    }
-    if (source.getContentEncoding() != null) {
-      ret.setContentEncoding(source.getContentEncoding());
-    }
-    if (source.getContentMD5() != null) {
-      ret.setContentMD5(source.getContentMD5());
-    }
-    if (source.getContentType() != null) {
-      ret.setContentType(source.getContentType());
-    }
-    if (source.getExpirationTime() != null) {
-      ret.setExpirationTime(source.getExpirationTime());
-    }
-    if (source.getExpirationTimeRuleId() != null) {
-      ret.setExpirationTimeRuleId(source.getExpirationTimeRuleId());
-    }
-    if (source.getHttpExpiresDate() != null) {
-      ret.setHttpExpiresDate(source.getHttpExpiresDate());
-    }
-    if (source.getLastModified() != null) {
-      ret.setLastModified(source.getLastModified());
-    }
-    if (source.getOngoingRestore() != null) {
-      ret.setOngoingRestore(source.getOngoingRestore());
-    }
-    if (source.getRestoreExpirationTime() != null) {
-      ret.setRestoreExpirationTime(source.getRestoreExpirationTime());
-    }
-    if (source.getSSEAlgorithm() != null) {
-      ret.setSSEAlgorithm(source.getSSEAlgorithm());
-    }
-    if (source.getSSECustomerAlgorithm() != null) {
-      ret.setSSECustomerAlgorithm(source.getSSECustomerAlgorithm());
-    }
-    if (source.getSSECustomerKeyMd5() != null) {
-      ret.setSSECustomerKeyMd5(source.getSSECustomerKeyMd5());
-    }
-
-    for (Map.Entry<String, String> e : source.getUserMetadata().entrySet()) {
-      ret.addUserMetadata(e.getKey(), e.getValue());
-    }
-    return ret;
+    return getHeaderProcessing().cloneObjectMetadata(source, ret);

Review comment:
       nit: seems better if we move those comments above into the 
implementation method `HeaderProcessing#cloneObjectMetadata`
   
   ```
   // This approach may be too brittle, especially if
   // in future there are new attributes added to ObjectMetadata
   // that we do not explicitly call to set here
   ```

##########
File path: 
hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/committer_architecture.md
##########
@@ -1312,6 +1312,16 @@ On `close()`, summary data would be written to the file
 `/results/latest/__magic/job400_1/task_01_01/latest.orc.lzo.pending`.
 This would contain the upload ID and all the parts and etags of uploaded data.
 
+A marker file is also created, so that code which verifies that a newly 
created file
+exists does not fail.
+1. These marker files are zero bytes long.
+1. They declare the full length of the final file in the HTTP header
+   `x-hadoop-s3a-magic-marker`.

Review comment:
       you mean `x-hadoop-s3a-magic-data-length` here?

##########
File path: 
hadoop-common-project/hadoop-common/src/main/resources/core-default.xml
##########
@@ -1873,11 +1873,9 @@
 
 <property>
   <name>fs.s3a.committer.magic.enabled</name>
-  <value>false</value>
+  <value>true</value>

Review comment:
       I'm +1 on enabling this by default as it does not depend on S3Guard any 
more. I totally agree we should update the JIRA release notes or commit message 
to indicate this default config change. Or ideally a separate PR.

##########
File path: 
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/impl/HeaderProcessing.java
##########
@@ -0,0 +1,300 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.fs.s3a.impl;
+
+import java.io.IOException;
+import java.nio.charset.StandardCharsets;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Map;
+import java.util.Optional;
+import java.util.TreeMap;
+
+import com.amazonaws.services.s3.Headers;
+import com.amazonaws.services.s3.model.ObjectMetadata;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import org.apache.commons.lang3.StringUtils;
+import org.apache.hadoop.fs.Path;
+
+import static org.apache.hadoop.fs.s3a.Constants.HEADER_PREFIX;
+import static 
org.apache.hadoop.fs.s3a.commit.CommitConstants.X_HEADER_MAGIC_MARKER;
+
+/**
+ * Part of the S3A FS where object headers are
+ * processed.
+ * Implements all the various XAttr read operations.
+ * Those APIs all expect byte arrays back.
+ * Metadata cloning is also implemented here, so as
+ * to stay in sync with custom header logic.
+ */
+public class HeaderProcessing extends AbstractStoreOperation {
+
+  private static final Logger LOG = LoggerFactory.getLogger(
+      HeaderProcessing.class);
+
+  private static final byte[] EMPTY = new byte[0];
+
+  /**
+   * Length XAttr.
+   */
+  public static final String XA_CONTENT_LENGTH =
+      HEADER_PREFIX + Headers.CONTENT_LENGTH;
+
+  /**
+   * last modified XAttr.
+   */
+  public static final String XA_LAST_MODIFIED =
+      HEADER_PREFIX + Headers.LAST_MODIFIED;
+
+  public static final String XA_CONTENT_DISPOSITION =
+      HEADER_PREFIX + Headers.CONTENT_DISPOSITION;
+
+  public static final String XA_CONTENT_ENCODING =
+      HEADER_PREFIX + Headers.CONTENT_ENCODING;
+
+  public static final String XA_CONTENT_LANGUAGE =
+      HEADER_PREFIX + Headers.CONTENT_LANGUAGE;
+
+  public static final String XA_CONTENT_MD5 =
+      HEADER_PREFIX + Headers.CONTENT_MD5;
+
+  public static final String XA_CONTENT_RANGE =
+      HEADER_PREFIX + Headers.CONTENT_RANGE;
+
+  public static final String XA_CONTENT_TYPE =
+      HEADER_PREFIX + Headers.CONTENT_TYPE;
+
+  public static final String XA_ETAG = HEADER_PREFIX + Headers.ETAG;
+
+  public HeaderProcessing(final StoreContext storeContext) {
+    super(storeContext);
+  }
+
+  /**
+   * Query the store, get all the headers into a map. Each Header
+   * has the "header." prefix.
+   * Caller must have read access.
+   * The value of each header is the string value of the object
+   * UTF-8 encoded.
+   * @param path path of object.
+   * @return the headers
+   * @throws IOException failure, including file not found.
+   */
+  private Map<String, byte[]> retrieveHeaders(Path path) throws IOException {
+    StoreContext context = getStoreContext();
+    ObjectMetadata md = context.getContextAccessors()
+        .getObjectMetadata(path);
+    Map<String, String> rawHeaders = md.getUserMetadata();
+    Map<String, byte[]> headers = new TreeMap<>();
+    rawHeaders.forEach((key, value) ->
+        headers.put(HEADER_PREFIX + key, encodeBytes(value)));
+    // and add the usual content length &c, if set
+    headers.put(XA_CONTENT_DISPOSITION,
+        encodeBytes(md.getContentDisposition()));
+    headers.put(XA_CONTENT_ENCODING,
+        encodeBytes(md.getContentEncoding()));
+    headers.put(XA_CONTENT_LANGUAGE,
+        encodeBytes(md.getContentLanguage()));
+    headers.put(XA_CONTENT_LENGTH,
+        encodeBytes(md.getContentLength()));
+    headers.put(
+        XA_CONTENT_MD5,
+        encodeBytes(md.getContentMD5()));
+    headers.put(XA_CONTENT_RANGE,
+        encodeBytes(md.getContentRange()));
+    headers.put(XA_CONTENT_TYPE,
+        encodeBytes(md.getContentType()));
+    headers.put(XA_ETAG,
+        encodeBytes(md.getETag()));
+    headers.put(XA_LAST_MODIFIED,
+        encodeBytes(md.getLastModified()));
+    return headers;
+  }
+
+  /**
+   * Stringify an object and return its bytes in UTF-8 encoding.
+   * @param s source
+   * @return encoded object or null

Review comment:
       will never return `null`, but could return empty byte array?

##########
File path: 
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java
##########
@@ -4382,6 +4341,37 @@ public EtagChecksum getFileChecksum(Path f, final long 
length)
     }
   }
 
+  /**
+   * Get header processing support.
+   * @return the header processing of this instance.
+   */
+  private HeaderProcessing getHeaderProcessing() {
+    return headerProcessing;
+  }
+
+  @Override
+  public byte[] getXAttr(final Path path, final String name)
+      throws IOException {
+    return getHeaderProcessing().getXAttr(path, name);
+  }
+
+  @Override
+  public Map<String, byte[]> getXAttrs(final Path path) throws IOException {
+    return getHeaderProcessing().getXAttrs(path);
+  }
+
+  @Override
+  public Map<String, byte[]> getXAttrs(final Path path,
+      final List<String> names)
+      throws IOException {
+    return getHeaderProcessing().getXAttrs(path, names);
+  }
+
+  @Override
+  public List<String> listXAttrs(final Path path) throws IOException {
+    return headerProcessing.listXAttrs(path);

Review comment:
       nit: replace `headerProcessing` with private `getHeaderProcessing()` as 
the `getXattrs` method?

##########
File path: 
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/impl/HeaderProcessing.java
##########
@@ -0,0 +1,300 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.fs.s3a.impl;
+
+import java.io.IOException;
+import java.nio.charset.StandardCharsets;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Map;
+import java.util.Optional;
+import java.util.TreeMap;
+
+import com.amazonaws.services.s3.Headers;
+import com.amazonaws.services.s3.model.ObjectMetadata;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import org.apache.commons.lang3.StringUtils;
+import org.apache.hadoop.fs.Path;
+
+import static org.apache.hadoop.fs.s3a.Constants.HEADER_PREFIX;
+import static 
org.apache.hadoop.fs.s3a.commit.CommitConstants.X_HEADER_MAGIC_MARKER;
+
+/**
+ * Part of the S3A FS where object headers are
+ * processed.
+ * Implements all the various XAttr read operations.
+ * Those APIs all expect byte arrays back.
+ * Metadata cloning is also implemented here, so as
+ * to stay in sync with custom header logic.
+ */
+public class HeaderProcessing extends AbstractStoreOperation {
+
+  private static final Logger LOG = LoggerFactory.getLogger(
+      HeaderProcessing.class);
+
+  private static final byte[] EMPTY = new byte[0];
+
+  /**
+   * Length XAttr.
+   */
+  public static final String XA_CONTENT_LENGTH =
+      HEADER_PREFIX + Headers.CONTENT_LENGTH;
+
+  /**
+   * last modified XAttr.
+   */
+  public static final String XA_LAST_MODIFIED =
+      HEADER_PREFIX + Headers.LAST_MODIFIED;
+
+  public static final String XA_CONTENT_DISPOSITION =
+      HEADER_PREFIX + Headers.CONTENT_DISPOSITION;
+
+  public static final String XA_CONTENT_ENCODING =
+      HEADER_PREFIX + Headers.CONTENT_ENCODING;
+
+  public static final String XA_CONTENT_LANGUAGE =
+      HEADER_PREFIX + Headers.CONTENT_LANGUAGE;
+
+  public static final String XA_CONTENT_MD5 =
+      HEADER_PREFIX + Headers.CONTENT_MD5;
+
+  public static final String XA_CONTENT_RANGE =
+      HEADER_PREFIX + Headers.CONTENT_RANGE;
+
+  public static final String XA_CONTENT_TYPE =
+      HEADER_PREFIX + Headers.CONTENT_TYPE;
+
+  public static final String XA_ETAG = HEADER_PREFIX + Headers.ETAG;
+
+  public HeaderProcessing(final StoreContext storeContext) {
+    super(storeContext);
+  }
+
+  /**
+   * Query the store, get all the headers into a map. Each Header
+   * has the "header." prefix.
+   * Caller must have read access.
+   * The value of each header is the string value of the object
+   * UTF-8 encoded.
+   * @param path path of object.
+   * @return the headers
+   * @throws IOException failure, including file not found.
+   */
+  private Map<String, byte[]> retrieveHeaders(Path path) throws IOException {
+    StoreContext context = getStoreContext();
+    ObjectMetadata md = context.getContextAccessors()
+        .getObjectMetadata(path);
+    Map<String, String> rawHeaders = md.getUserMetadata();
+    Map<String, byte[]> headers = new TreeMap<>();
+    rawHeaders.forEach((key, value) ->
+        headers.put(HEADER_PREFIX + key, encodeBytes(value)));
+    // and add the usual content length &c, if set
+    headers.put(XA_CONTENT_DISPOSITION,
+        encodeBytes(md.getContentDisposition()));
+    headers.put(XA_CONTENT_ENCODING,
+        encodeBytes(md.getContentEncoding()));
+    headers.put(XA_CONTENT_LANGUAGE,
+        encodeBytes(md.getContentLanguage()));
+    headers.put(XA_CONTENT_LENGTH,
+        encodeBytes(md.getContentLength()));
+    headers.put(
+        XA_CONTENT_MD5,
+        encodeBytes(md.getContentMD5()));
+    headers.put(XA_CONTENT_RANGE,
+        encodeBytes(md.getContentRange()));
+    headers.put(XA_CONTENT_TYPE,
+        encodeBytes(md.getContentType()));
+    headers.put(XA_ETAG,
+        encodeBytes(md.getETag()));
+    headers.put(XA_LAST_MODIFIED,
+        encodeBytes(md.getLastModified()));
+    return headers;
+  }
+
+  /**
+   * Stringify an object and return its bytes in UTF-8 encoding.
+   * @param s source
+   * @return encoded object or null
+   */
+  public static byte[] encodeBytes(Object s) {
+    return s == null
+        ? EMPTY
+        : s.toString().getBytes(StandardCharsets.UTF_8);
+  }
+
+  /**
+   * Get the string value from the bytes.
+   * if null : return null, otherwise the UTF-8 decoded
+   * bytes.
+   * @param bytes source bytes
+   * @return decoded value
+   */
+  public static String decodeBytes(byte[] bytes) {
+    return bytes == null
+        ? null
+        : new String(bytes, StandardCharsets.UTF_8);
+  }
+
+  /**
+   * Get an XAttr name and value for a file or directory.
+   * @param path Path to get extended attribute
+   * @param name XAttr name.
+   * @return byte[] XAttr value or null
+   * @throws IOException IO failure
+   * @throws UnsupportedOperationException if the operation is unsupported
+   *         (default outcome).
+   */
+  public byte[] getXAttr(Path path, String name) throws IOException {
+    return retrieveHeaders(path).get(name);
+  }
+
+  /**
+   * See {@code FileSystem.getXAttrs(path}.
+   *
+   * @param path Path to get extended attributes
+   * @return Map describing the XAttrs of the file or directory
+   * @throws IOException IO failure
+   * @throws UnsupportedOperationException if the operation is unsupported
+   *         (default outcome).
+   */
+  public Map<String, byte[]> getXAttrs(Path path) throws IOException {
+    return retrieveHeaders(path);
+  }
+
+  /**
+   * See {@code FileSystem.listXAttrs(path)}.
+   * @param path Path to get extended attributes
+   * @return List of supported XAttrs
+   * @throws IOException IO failure
+   */
+  public List<String> listXAttrs(final Path path) throws IOException {
+    return new ArrayList<>(retrieveHeaders(path).keySet());
+  }
+
+  /**
+   * See {@code FileSystem.getXAttrs(path, names}.
+   * @param path Path to get extended attributes
+   * @param names XAttr names.
+   * @return Map describing the XAttrs of the file or directory
+   * @throws IOException IO failure
+   */
+  public Map<String, byte[]> getXAttrs(Path path, List<String> names)
+      throws IOException {
+    Map<String, byte[]> headers = retrieveHeaders(path);
+    Map<String, byte[]> result = new TreeMap<>();
+    headers.entrySet().stream()
+        .filter(entry -> names.contains(entry.getKey()))
+        .forEach(entry -> result.put(entry.getKey(), entry.getValue()));
+    return result;
+  }
+
+  /**
+   * Convert an XAttr byte array to a long.
+   * testability.
+   * @param data data to parse
+   * @return either a length or none
+   */
+  public static Optional<Long> extractXAttrLongValue(byte[] data) {
+    String xAttr;
+    xAttr = HeaderProcessing.decodeBytes(data);
+    if (StringUtils.isNotEmpty(xAttr)) {
+      try {
+        long l = Long.parseLong(xAttr);
+        if (l >= 0) {
+          return Optional.of(l);
+        }
+      } catch (NumberFormatException ex) {
+        LOG.warn("Not a number: {}", xAttr);

Review comment:
       Is it useful to print `ex`? Or along with the line `LOG.debug("Fail to 
parse {} to long with exception", xAttr, ex)`?

##########
File path: 
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/impl/HeaderProcessing.java
##########
@@ -0,0 +1,300 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.fs.s3a.impl;
+
+import java.io.IOException;
+import java.nio.charset.StandardCharsets;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Map;
+import java.util.Optional;
+import java.util.TreeMap;
+
+import com.amazonaws.services.s3.Headers;
+import com.amazonaws.services.s3.model.ObjectMetadata;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import org.apache.commons.lang3.StringUtils;
+import org.apache.hadoop.fs.Path;
+
+import static org.apache.hadoop.fs.s3a.Constants.HEADER_PREFIX;
+import static 
org.apache.hadoop.fs.s3a.commit.CommitConstants.X_HEADER_MAGIC_MARKER;
+
+/**
+ * Part of the S3A FS where object headers are
+ * processed.
+ * Implements all the various XAttr read operations.
+ * Those APIs all expect byte arrays back.
+ * Metadata cloning is also implemented here, so as
+ * to stay in sync with custom header logic.
+ */
+public class HeaderProcessing extends AbstractStoreOperation {
+
+  private static final Logger LOG = LoggerFactory.getLogger(
+      HeaderProcessing.class);
+
+  private static final byte[] EMPTY = new byte[0];
+
+  /**
+   * Length XAttr.
+   */
+  public static final String XA_CONTENT_LENGTH =
+      HEADER_PREFIX + Headers.CONTENT_LENGTH;
+
+  /**
+   * last modified XAttr.
+   */
+  public static final String XA_LAST_MODIFIED =
+      HEADER_PREFIX + Headers.LAST_MODIFIED;
+
+  public static final String XA_CONTENT_DISPOSITION =
+      HEADER_PREFIX + Headers.CONTENT_DISPOSITION;
+
+  public static final String XA_CONTENT_ENCODING =
+      HEADER_PREFIX + Headers.CONTENT_ENCODING;
+
+  public static final String XA_CONTENT_LANGUAGE =
+      HEADER_PREFIX + Headers.CONTENT_LANGUAGE;
+
+  public static final String XA_CONTENT_MD5 =
+      HEADER_PREFIX + Headers.CONTENT_MD5;
+
+  public static final String XA_CONTENT_RANGE =
+      HEADER_PREFIX + Headers.CONTENT_RANGE;
+
+  public static final String XA_CONTENT_TYPE =
+      HEADER_PREFIX + Headers.CONTENT_TYPE;
+
+  public static final String XA_ETAG = HEADER_PREFIX + Headers.ETAG;
+
+  public HeaderProcessing(final StoreContext storeContext) {
+    super(storeContext);
+  }
+
+  /**
+   * Query the store, get all the headers into a map. Each Header
+   * has the "header." prefix.
+   * Caller must have read access.
+   * The value of each header is the string value of the object
+   * UTF-8 encoded.
+   * @param path path of object.
+   * @return the headers
+   * @throws IOException failure, including file not found.
+   */
+  private Map<String, byte[]> retrieveHeaders(Path path) throws IOException {
+    StoreContext context = getStoreContext();
+    ObjectMetadata md = context.getContextAccessors()
+        .getObjectMetadata(path);
+    Map<String, String> rawHeaders = md.getUserMetadata();
+    Map<String, byte[]> headers = new TreeMap<>();
+    rawHeaders.forEach((key, value) ->
+        headers.put(HEADER_PREFIX + key, encodeBytes(value)));
+    // and add the usual content length &c, if set
+    headers.put(XA_CONTENT_DISPOSITION,
+        encodeBytes(md.getContentDisposition()));
+    headers.put(XA_CONTENT_ENCODING,
+        encodeBytes(md.getContentEncoding()));
+    headers.put(XA_CONTENT_LANGUAGE,
+        encodeBytes(md.getContentLanguage()));
+    headers.put(XA_CONTENT_LENGTH,
+        encodeBytes(md.getContentLength()));
+    headers.put(
+        XA_CONTENT_MD5,
+        encodeBytes(md.getContentMD5()));
+    headers.put(XA_CONTENT_RANGE,
+        encodeBytes(md.getContentRange()));
+    headers.put(XA_CONTENT_TYPE,
+        encodeBytes(md.getContentType()));
+    headers.put(XA_ETAG,
+        encodeBytes(md.getETag()));
+    headers.put(XA_LAST_MODIFIED,
+        encodeBytes(md.getLastModified()));
+    return headers;
+  }
+
+  /**
+   * Stringify an object and return its bytes in UTF-8 encoding.
+   * @param s source
+   * @return encoded object or null
+   */
+  public static byte[] encodeBytes(Object s) {
+    return s == null
+        ? EMPTY
+        : s.toString().getBytes(StandardCharsets.UTF_8);
+  }
+
+  /**
+   * Get the string value from the bytes.
+   * if null : return null, otherwise the UTF-8 decoded
+   * bytes.
+   * @param bytes source bytes
+   * @return decoded value
+   */
+  public static String decodeBytes(byte[] bytes) {
+    return bytes == null
+        ? null
+        : new String(bytes, StandardCharsets.UTF_8);
+  }
+
+  /**
+   * Get an XAttr name and value for a file or directory.
+   * @param path Path to get extended attribute
+   * @param name XAttr name.
+   * @return byte[] XAttr value or null
+   * @throws IOException IO failure
+   * @throws UnsupportedOperationException if the operation is unsupported
+   *         (default outcome).
+   */
+  public byte[] getXAttr(Path path, String name) throws IOException {
+    return retrieveHeaders(path).get(name);
+  }
+
+  /**
+   * See {@code FileSystem.getXAttrs(path}.
+   *
+   * @param path Path to get extended attributes
+   * @return Map describing the XAttrs of the file or directory
+   * @throws IOException IO failure
+   * @throws UnsupportedOperationException if the operation is unsupported
+   *         (default outcome).
+   */
+  public Map<String, byte[]> getXAttrs(Path path) throws IOException {
+    return retrieveHeaders(path);
+  }
+
+  /**
+   * See {@code FileSystem.listXAttrs(path)}.
+   * @param path Path to get extended attributes
+   * @return List of supported XAttrs
+   * @throws IOException IO failure
+   */
+  public List<String> listXAttrs(final Path path) throws IOException {
+    return new ArrayList<>(retrieveHeaders(path).keySet());
+  }
+
+  /**
+   * See {@code FileSystem.getXAttrs(path, names}.
+   * @param path Path to get extended attributes
+   * @param names XAttr names.
+   * @return Map describing the XAttrs of the file or directory
+   * @throws IOException IO failure
+   */
+  public Map<String, byte[]> getXAttrs(Path path, List<String> names)
+      throws IOException {
+    Map<String, byte[]> headers = retrieveHeaders(path);
+    Map<String, byte[]> result = new TreeMap<>();
+    headers.entrySet().stream()
+        .filter(entry -> names.contains(entry.getKey()))
+        .forEach(entry -> result.put(entry.getKey(), entry.getValue()));
+    return result;
+  }
+
+  /**
+   * Convert an XAttr byte array to a long.
+   * testability.
+   * @param data data to parse
+   * @return either a length or none
+   */
+  public static Optional<Long> extractXAttrLongValue(byte[] data) {
+    String xAttr;
+    xAttr = HeaderProcessing.decodeBytes(data);
+    if (StringUtils.isNotEmpty(xAttr)) {
+      try {
+        long l = Long.parseLong(xAttr);
+        if (l >= 0) {
+          return Optional.of(l);
+        }
+      } catch (NumberFormatException ex) {
+        LOG.warn("Not a number: {}", xAttr);
+      }
+    }
+    // missing/empty header or parse failure.
+    return Optional.empty();
+  }
+
+  /**
+   * Creates a copy of the passed {@link ObjectMetadata}.
+   * Does so without using the {@link ObjectMetadata#clone()} method,
+   * to avoid copying unnecessary headers.
+   * This operation does not copy the {@code X_HEADER_MAGIC_MARKER}
+   * header to avoid confusion. If a marker file is renamed,
+   * it loses information about any remapped file.
+   * @param source the {@link ObjectMetadata} to copy
+   * @param ret the metadata to update; this is the return value.
+   * @return a copy of {@link ObjectMetadata} with only relevant attributes
+   */
+  public ObjectMetadata cloneObjectMetadata(ObjectMetadata source,
+      ObjectMetadata ret) {
+
+    // Possibly null attributes
+    // Allowing nulls to pass breaks it during later use
+    if (source.getCacheControl() != null) {
+      ret.setCacheControl(source.getCacheControl());
+    }
+    if (source.getContentDisposition() != null) {
+      ret.setContentDisposition(source.getContentDisposition());
+    }
+    if (source.getContentEncoding() != null) {
+      ret.setContentEncoding(source.getContentEncoding());
+    }
+    if (source.getContentMD5() != null) {
+      ret.setContentMD5(source.getContentMD5());
+    }
+    if (source.getContentType() != null) {
+      ret.setContentType(source.getContentType());
+    }
+    if (source.getExpirationTime() != null) {
+      ret.setExpirationTime(source.getExpirationTime());
+    }
+    if (source.getExpirationTimeRuleId() != null) {
+      ret.setExpirationTimeRuleId(source.getExpirationTimeRuleId());
+    }
+    if (source.getHttpExpiresDate() != null) {
+      ret.setHttpExpiresDate(source.getHttpExpiresDate());
+    }
+    if (source.getLastModified() != null) {
+      ret.setLastModified(source.getLastModified());
+    }
+    if (source.getOngoingRestore() != null) {
+      ret.setOngoingRestore(source.getOngoingRestore());
+    }
+    if (source.getRestoreExpirationTime() != null) {
+      ret.setRestoreExpirationTime(source.getRestoreExpirationTime());
+    }
+    if (source.getSSEAlgorithm() != null) {
+      ret.setSSEAlgorithm(source.getSSEAlgorithm());
+    }
+    if (source.getSSECustomerAlgorithm() != null) {
+      ret.setSSECustomerAlgorithm(source.getSSECustomerAlgorithm());
+    }
+    if (source.getSSECustomerKeyMd5() != null) {
+      ret.setSSECustomerKeyMd5(source.getSSECustomerKeyMd5());
+    }
+
+    // copy user metadata except the magic marker header.
+    for (Map.Entry<String, String> e : source.getUserMetadata().entrySet()) {

Review comment:
       nit: I know this is based on existing code...but Java 8 stream?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [hadoop] liuml07 commented on a change in pull request #2530: HADOOP-17414. Magic committer files don't have the count of bytes written collected by spark

Reply via email to