[
https://issues.apache.org/jira/browse/HADOOP-18706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17810868#comment-17810868
]
ASF GitHub Bot commented on HADOOP-18706:
-----------------------------------------
steveloughran commented on code in PR #5771:
URL: https://github.com/apache/hadoop/pull/5771#discussion_r1466282702
##########
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java:
##########
@@ -1369,12 +1369,69 @@ public S3AEncryptionMethods getS3EncryptionAlgorithm() {
File createTmpFileForWrite(String pathStr, long size,
Configuration conf) throws IOException {
initLocalDirAllocatorIfNotInitialized(conf);
- Path path = directoryAllocator.getLocalPathForWrite(pathStr,
- size, conf);
+ Path path = directoryAllocator.getLocalPathForWrite(pathStr, size, conf);
File dir = new File(path.getParent().toUri().getPath());
- String prefix = path.getName();
- // create a temp file on this directory
- return File.createTempFile(prefix, null, dir);
+ return safeCreateTempFile(pathStr, null, dir);
+ }
+
+ // TODO remove this method when hadoop upgrades to a newer version of java
than 1.8
+ /**
+ * Ensure that the temp file prefix and suffix don't exceed the maximum
number of characters
+ * allowed by the underlying file system. This validation isn't required in
Java 9+ since
+ * {@link java.io.File#createTempFile(String, String, File)} automatically
truncates file names.
+ *
+ * @param prefix prefix for the temporary file
+ * @param suffix suffix for the temporary file
+ * @param dir directory to create the temporary file in
+ * @return a unique temporary file
+ * @throws IOException
+ */
+ static File safeCreateTempFile(String prefix, String suffix, File dir)
throws IOException
+ {
+ // avoid validating multiple times.
+ // if the jvm running is version 9+ then defer to java.io.File validation
implementation
+ if(Float.parseFloat(System.getProperty("java.class.version")) >= 53) {
+ return File.createTempFile(prefix, null, dir);
+ }
+
+ // if no suffix was defined assume the default
+ if(suffix == null) {
+ suffix = ".tmp";
+ }
+ // Use only the file name from the supplied prefix
+ prefix = (new File(prefix)).getName();
+
+ int prefixLength = prefix.length();
+ int suffixLength = suffix.length();
+ int maxRandomSuffixLen = 19; //
Long.toUnsignedString(Long.MAX_VALUE).length()
+
+ String name;
+ int nameMax = 255; // unable to access the underlying FS directly, so
assume 255
Review Comment:
make a constant, e.g ASSUMED_MAX_FILENAME
##########
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java:
##########
@@ -1369,12 +1369,69 @@ public S3AEncryptionMethods getS3EncryptionAlgorithm() {
File createTmpFileForWrite(String pathStr, long size,
Configuration conf) throws IOException {
initLocalDirAllocatorIfNotInitialized(conf);
- Path path = directoryAllocator.getLocalPathForWrite(pathStr,
- size, conf);
+ Path path = directoryAllocator.getLocalPathForWrite(pathStr, size, conf);
File dir = new File(path.getParent().toUri().getPath());
- String prefix = path.getName();
- // create a temp file on this directory
- return File.createTempFile(prefix, null, dir);
+ return safeCreateTempFile(pathStr, null, dir);
+ }
+
+ // TODO remove this method when hadoop upgrades to a newer version of java
than 1.8
+ /**
+ * Ensure that the temp file prefix and suffix don't exceed the maximum
number of characters
+ * allowed by the underlying file system. This validation isn't required in
Java 9+ since
+ * {@link java.io.File#createTempFile(String, String, File)} automatically
truncates file names.
+ *
+ * @param prefix prefix for the temporary file
+ * @param suffix suffix for the temporary file
+ * @param dir directory to create the temporary file in
+ * @return a unique temporary file
+ * @throws IOException
+ */
+ static File safeCreateTempFile(String prefix, String suffix, File dir)
throws IOException
+ {
+ // avoid validating multiple times.
+ // if the jvm running is version 9+ then defer to java.io.File validation
implementation
+ if(Float.parseFloat(System.getProperty("java.class.version")) >= 53) {
Review Comment:
this should go in org.apache.hadoop.util.Shell; there's already something
similar. will need a test somehow.
##########
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3ADataBlocks.java:
##########
@@ -798,6 +800,8 @@ public String toString() {
* Buffer blocks to disk.
*/
static class DiskBlockFactory extends BlockFactory {
+ private static final String ESCAPED_FORWARD_SLASH = "EFS";
Review Comment:
think I might prefer something more distinguishable from text, e.g "_FS_"
and "_BS_" so its easier to read in a dir listing
##########
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java:
##########
@@ -1369,12 +1369,69 @@ public S3AEncryptionMethods getS3EncryptionAlgorithm() {
File createTmpFileForWrite(String pathStr, long size,
Configuration conf) throws IOException {
initLocalDirAllocatorIfNotInitialized(conf);
- Path path = directoryAllocator.getLocalPathForWrite(pathStr,
- size, conf);
+ Path path = directoryAllocator.getLocalPathForWrite(pathStr, size, conf);
File dir = new File(path.getParent().toUri().getPath());
- String prefix = path.getName();
- // create a temp file on this directory
- return File.createTempFile(prefix, null, dir);
+ return safeCreateTempFile(pathStr, null, dir);
+ }
+
+ // TODO remove this method when hadoop upgrades to a newer version of java
than 1.8
+ /**
+ * Ensure that the temp file prefix and suffix don't exceed the maximum
number of characters
+ * allowed by the underlying file system. This validation isn't required in
Java 9+ since
+ * {@link java.io.File#createTempFile(String, String, File)} automatically
truncates file names.
+ *
+ * @param prefix prefix for the temporary file
+ * @param suffix suffix for the temporary file
+ * @param dir directory to create the temporary file in
+ * @return a unique temporary file
+ * @throws IOException
+ */
+ static File safeCreateTempFile(String prefix, String suffix, File dir)
throws IOException
+ {
+ // avoid validating multiple times.
+ // if the jvm running is version 9+ then defer to java.io.File validation
implementation
+ if(Float.parseFloat(System.getProperty("java.class.version")) >= 53) {
+ return File.createTempFile(prefix, null, dir);
+ }
+
+ // if no suffix was defined assume the default
+ if(suffix == null) {
+ suffix = ".tmp";
+ }
+ // Use only the file name from the supplied prefix
+ prefix = (new File(prefix)).getName();
+
+ int prefixLength = prefix.length();
+ int suffixLength = suffix.length();
+ int maxRandomSuffixLen = 19; //
Long.toUnsignedString(Long.MAX_VALUE).length()
+
+ String name;
+ int nameMax = 255; // unable to access the underlying FS directly, so
assume 255
+ int excess = prefixLength + maxRandomSuffixLen + suffixLength - nameMax;
+
+ // shorten the prefix length if the file name exceeds 255 chars
Review Comment:
and replace explicit size with "too long"
##########
hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/ITestS3ABlockOutputArray.java:
##########
@@ -79,6 +82,46 @@ public void testRegularUpload() throws IOException {
verifyUpload("regular", 1024);
}
+ /**
+ * Test that the DiskBlock's local file doesn't result in error when the S3
key exceeds the max
+ * char limit of the local file system. Currently
+ * {@link java.io.File#createTempFile(String, String, File)} is being relied
on to handle the
+ * truncation.
+ * @throws IOException
+ */
+ @Test
+ public void testDiskBlockCreate() throws IOException {
+ String s3Key = // 1024 char
+
"very_long_s3_key__very_long_s3_key__very_long_s3_key__very_long_s3_key__" +
+
"very_long_s3_key__very_long_s3_key__very_long_s3_key__very_long_s3_key__" +
+
"very_long_s3_key__very_long_s3_key__very_long_s3_key__very_long_s3_key__" +
+
"very_long_s3_key__very_long_s3_key__very_long_s3_key__very_long_s3_key__" +
+
"very_long_s3_key__very_long_s3_key__very_long_s3_key__very_long_s3_key__" +
+
"very_long_s3_key__very_long_s3_key__very_long_s3_key__very_long_s3_key__" +
+
"very_long_s3_key__very_long_s3_key__very_long_s3_key__very_long_s3_key__" +
+
"very_long_s3_key__very_long_s3_key__very_long_s3_key__very_long_s3_key__" +
+
"very_long_s3_key__very_long_s3_key__very_long_s3_key__very_long_s3_key__" +
+
"very_long_s3_key__very_long_s3_key__very_long_s3_key__very_long_s3_key__" +
+
"very_long_s3_key__very_long_s3_key__very_long_s3_key__very_long_s3_key__" +
+
"very_long_s3_key__very_long_s3_key__very_long_s3_key__very_long_s3_key__" +
+
"very_long_s3_key__very_long_s3_key__very_long_s3_key__very_long_s3_key__" +
+
"very_long_s3_key__very_long_s3_key__very_long_s3_key__very_long_s3_key__" +
+ "very_long_s3_key";
+ long blockSize = getFileSystem().getDefaultBlockSize();
+ try (S3ADataBlocks.BlockFactory diskBlockFactory =
+ new S3ADataBlocks.DiskBlockFactory(getFileSystem());
+ S3ADataBlocks.DataBlock dataBlock =
+ diskBlockFactory.create("spanId", s3Key, 1, blockSize, null);
+ ) {
+ String tmpDir = getConfiguration().get("hadoop.tmp.dir");
+ boolean created = Arrays.stream(
+ Objects.requireNonNull(new File(tmpDir).listFiles()))
+ .anyMatch(f -> f.getName().contains("very_long_s3_key"));
+ assertTrue(String.format("tmp file should have been created locally in
%s", tmpDir), created);
Review Comment:
* use AssertJ. it should actually be possible to add an assert that the
array matches the requirement.
* should be exactly one; test suite setup should delete everything with
"very_long_s3_key" in it; so should teardown too, but to support things like
IDE debugging, its best to do both
##########
hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/TestS3ABlockOutputStream.java:
##########
@@ -59,6 +63,9 @@ private S3ABlockOutputStream.BlockOutputStreamBuilder
mockS3ABuilder() {
mock(S3ADataBlocks.BlockFactory.class);
long blockSize = Constants.DEFAULT_MULTIPART_SIZE;
WriteOperationHelper oHelper = mock(WriteOperationHelper.class);
+ AuditSpan auditSpan = mock(AuditSpan.class);
Review Comment:
you can just use `org.apache.hadoop.fs.s3a.audit.impl.NoopSpan` here; one
less thing to mock.
##########
hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/ITestS3ABlockOutputArray.java:
##########
@@ -79,6 +82,46 @@ public void testRegularUpload() throws IOException {
verifyUpload("regular", 1024);
}
+ /**
+ * Test that the DiskBlock's local file doesn't result in error when the S3
key exceeds the max
+ * char limit of the local file system. Currently
+ * {@link java.io.File#createTempFile(String, String, File)} is being relied
on to handle the
+ * truncation.
+ * @throws IOException
+ */
+ @Test
+ public void testDiskBlockCreate() throws IOException {
+ String s3Key = // 1024 char
Review Comment:
1. you could do this with a shorter string and simply concatenate it a few
times
2. its duplicated in TestS3aFilesystem...it should be a constant in
S3ATestConstants
##########
hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/TestS3AFileSystem.java:
##########
@@ -0,0 +1,110 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.fs.s3a;
+
+import org.junit.After;
+import org.junit.Assert;
+import org.junit.Before;
+import org.junit.Rule;
+import org.junit.Test;
+import org.junit.rules.Timeout;
+
+import java.io.File;
+import java.io.IOException;
+import java.nio.file.Files;
+import java.util.regex.Pattern;
+
+/**
+ * Unit tests for {@link S3AFileSystem}.
+ */
+public class TestS3AFileSystem extends Assert {
+ final File TEMP_DIR = new File("target/build/test/TestS3AFileSystem");
+ final String longStr = // 1024 char
+ "very_long_s3_key__very_long_s3_key__very_long_s3_key__very_long_s3_key__"
+
+
"very_long_s3_key__very_long_s3_key__very_long_s3_key__very_long_s3_key__" +
+
"very_long_s3_key__very_long_s3_key__very_long_s3_key__very_long_s3_key__" +
+
"very_long_s3_key__very_long_s3_key__very_long_s3_key__very_long_s3_key__" +
+
"very_long_s3_key__very_long_s3_key__very_long_s3_key__very_long_s3_key__" +
+
"very_long_s3_key__very_long_s3_key__very_long_s3_key__very_long_s3_key__" +
+
"very_long_s3_key__very_long_s3_key__very_long_s3_key__very_long_s3_key__" +
+
"very_long_s3_key__very_long_s3_key__very_long_s3_key__very_long_s3_key__" +
+
"very_long_s3_key__very_long_s3_key__very_long_s3_key__very_long_s3_key__" +
+
"very_long_s3_key__very_long_s3_key__very_long_s3_key__very_long_s3_key__" +
+
"very_long_s3_key__very_long_s3_key__very_long_s3_key__very_long_s3_key__" +
+
"very_long_s3_key__very_long_s3_key__very_long_s3_key__very_long_s3_key__" +
+
"very_long_s3_key__very_long_s3_key__very_long_s3_key__very_long_s3_key__" +
+
"very_long_s3_key__very_long_s3_key__very_long_s3_key__very_long_s3_key__" +
+ "very_long_s3_key";
+ final String longStrTruncated = "very_long_s3_key__very_long_s3_key__";
+
+ @Rule
+ public Timeout testTimeout = new Timeout(30 * 1000);
+
+ @Before
+ public void init() throws IOException {
+ Files.createDirectories(TEMP_DIR.toPath());
+ }
+
+ @After
+ public void teardown() throws IOException {
+ File[] testOutputFiles = TEMP_DIR.listFiles();
+ for(File file: testOutputFiles) {
+ Files.delete(file.toPath());
+ }
+ Files.deleteIfExists(TEMP_DIR.toPath());
+ }
+
+ @Before
+ public void nameThread() {
+ Thread.currentThread().setName("JUnit");
+ }
+
+ /**
+ * Test the {@link S3AFileSystem#safeCreateTempFile(String, String, File)}.
+ * The code verifies that the input prefix and suffix don't exceed the file
system's max name
+ * length and cause an exception.
+ *
+ * This test verifies the basic contract of the process.
+ */
+ @Test
+ public void testSafeCreateTempFile() throws Throwable {
+ // fitting name isn't changed
+ File noChangesRequired =
S3AFileSystem.safeCreateTempFile("noChangesRequired", ".tmp", TEMP_DIR);
+ assertTrue(noChangesRequired.exists());
+ String noChangesRequiredName = noChangesRequired.getName();
+ assertTrue(noChangesRequiredName.startsWith("noChangesRequired"));
+ assertTrue(noChangesRequiredName.endsWith(".tmp"));
+
+ // a long prefix should be truncated
+ File excessivelyLongPrefix = S3AFileSystem.safeCreateTempFile(longStr,
".tmp", TEMP_DIR);
Review Comment:
each of these should be split into their own test method so they can fail
independently. I know, we ignore that in ITests, but that is because they're
integration tests with longer setup overhead
##########
hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/TestS3AFileSystem.java:
##########
@@ -0,0 +1,110 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.fs.s3a;
+
+import org.junit.After;
+import org.junit.Assert;
+import org.junit.Before;
+import org.junit.Rule;
+import org.junit.Test;
+import org.junit.rules.Timeout;
+
+import java.io.File;
Review Comment:
nit: ordering...these should come first.
##########
hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/TestS3AFileSystem.java:
##########
@@ -0,0 +1,110 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.fs.s3a;
+
+import org.junit.After;
+import org.junit.Assert;
+import org.junit.Before;
+import org.junit.Rule;
+import org.junit.Test;
+import org.junit.rules.Timeout;
+
+import java.io.File;
+import java.io.IOException;
+import java.nio.file.Files;
+import java.util.regex.Pattern;
+
+/**
+ * Unit tests for {@link S3AFileSystem}.
+ */
+public class TestS3AFileSystem extends Assert {
+ final File TEMP_DIR = new File("target/build/test/TestS3AFileSystem");
+ final String longStr = // 1024 char
+ "very_long_s3_key__very_long_s3_key__very_long_s3_key__very_long_s3_key__"
+
+
"very_long_s3_key__very_long_s3_key__very_long_s3_key__very_long_s3_key__" +
+
"very_long_s3_key__very_long_s3_key__very_long_s3_key__very_long_s3_key__" +
+
"very_long_s3_key__very_long_s3_key__very_long_s3_key__very_long_s3_key__" +
+
"very_long_s3_key__very_long_s3_key__very_long_s3_key__very_long_s3_key__" +
+
"very_long_s3_key__very_long_s3_key__very_long_s3_key__very_long_s3_key__" +
+
"very_long_s3_key__very_long_s3_key__very_long_s3_key__very_long_s3_key__" +
+
"very_long_s3_key__very_long_s3_key__very_long_s3_key__very_long_s3_key__" +
+
"very_long_s3_key__very_long_s3_key__very_long_s3_key__very_long_s3_key__" +
+
"very_long_s3_key__very_long_s3_key__very_long_s3_key__very_long_s3_key__" +
+
"very_long_s3_key__very_long_s3_key__very_long_s3_key__very_long_s3_key__" +
+
"very_long_s3_key__very_long_s3_key__very_long_s3_key__very_long_s3_key__" +
+
"very_long_s3_key__very_long_s3_key__very_long_s3_key__very_long_s3_key__" +
+
"very_long_s3_key__very_long_s3_key__very_long_s3_key__very_long_s3_key__" +
+ "very_long_s3_key";
+ final String longStrTruncated = "very_long_s3_key__very_long_s3_key__";
+
+ @Rule
+ public Timeout testTimeout = new Timeout(30 * 1000);
+
+ @Before
+ public void init() throws IOException {
+ Files.createDirectories(TEMP_DIR.toPath());
+ }
+
+ @After
+ public void teardown() throws IOException {
+ File[] testOutputFiles = TEMP_DIR.listFiles();
+ for(File file: testOutputFiles) {
+ Files.delete(file.toPath());
+ }
+ Files.deleteIfExists(TEMP_DIR.toPath());
+ }
+
+ @Before
+ public void nameThread() {
+ Thread.currentThread().setName("JUnit");
+ }
+
+ /**
+ * Test the {@link S3AFileSystem#safeCreateTempFile(String, String, File)}.
+ * The code verifies that the input prefix and suffix don't exceed the file
system's max name
+ * length and cause an exception.
+ *
+ * This test verifies the basic contract of the process.
+ */
+ @Test
+ public void testSafeCreateTempFile() throws Throwable {
+ // fitting name isn't changed
+ File noChangesRequired =
S3AFileSystem.safeCreateTempFile("noChangesRequired", ".tmp", TEMP_DIR);
+ assertTrue(noChangesRequired.exists());
Review Comment:
bad news, use AssertJ. more verbose but the assertions it raised include
details about the failure. which is what I insist on, sorry. If a yetus build
fails, i want more than a line number. if this isn't done automatically, used
.describedAs() which takes String.format() string + varargs list
##########
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java:
##########
@@ -1369,12 +1369,69 @@ public S3AEncryptionMethods getS3EncryptionAlgorithm() {
File createTmpFileForWrite(String pathStr, long size,
Configuration conf) throws IOException {
initLocalDirAllocatorIfNotInitialized(conf);
- Path path = directoryAllocator.getLocalPathForWrite(pathStr,
- size, conf);
+ Path path = directoryAllocator.getLocalPathForWrite(pathStr, size, conf);
File dir = new File(path.getParent().toUri().getPath());
- String prefix = path.getName();
- // create a temp file on this directory
- return File.createTempFile(prefix, null, dir);
+ return safeCreateTempFile(pathStr, null, dir);
+ }
+
+ // TODO remove this method when hadoop upgrades to a newer version of java
than 1.8
+ /**
+ * Ensure that the temp file prefix and suffix don't exceed the maximum
number of characters
+ * allowed by the underlying file system. This validation isn't required in
Java 9+ since
+ * {@link java.io.File#createTempFile(String, String, File)} automatically
truncates file names.
+ *
+ * @param prefix prefix for the temporary file
+ * @param suffix suffix for the temporary file
+ * @param dir directory to create the temporary file in
+ * @return a unique temporary file
+ * @throws IOException
+ */
+ static File safeCreateTempFile(String prefix, String suffix, File dir)
throws IOException
+ {
+ // avoid validating multiple times.
+ // if the jvm running is version 9+ then defer to java.io.File validation
implementation
+ if(Float.parseFloat(System.getProperty("java.class.version")) >= 53) {
+ return File.createTempFile(prefix, null, dir);
+ }
+
+ // if no suffix was defined assume the default
+ if(suffix == null) {
+ suffix = ".tmp";
+ }
+ // Use only the file name from the supplied prefix
+ prefix = (new File(prefix)).getName();
+
+ int prefixLength = prefix.length();
+ int suffixLength = suffix.length();
+ int maxRandomSuffixLen = 19; //
Long.toUnsignedString(Long.MAX_VALUE).length()
+
+ String name;
+ int nameMax = 255; // unable to access the underlying FS directly, so
assume 255
+ int excess = prefixLength + maxRandomSuffixLen + suffixLength - nameMax;
+
+ // shorten the prefix length if the file name exceeds 255 chars
+ if (excess > 0) {
+ // Attempt to shorten the prefix length to no less than 3
+ prefixLength = shortenSubName(prefixLength, excess, 3);
+ prefix = prefix.substring(0, prefixLength);
+ }
+ // shorten the suffix if the file name still exceeds 255 chars
+ excess = prefixLength + maxRandomSuffixLen + suffixLength - nameMax;
+ if (excess > 0) {
+ // Attempt to shorten the suffix length to no less than 3
+ suffixLength = shortenSubName(suffixLength, excess, 3);
+ suffix = suffix.substring(0, suffixLength);
+ }
+
+ return File.createTempFile(prefix, suffix, dir);
+ }
+
+ private static int shortenSubName(int subNameLength, int excess, int
nameMin) {
Review Comment:
add javadocs. a unit test would be good too and straightforward to add.
##########
hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/TestS3AFileSystem.java:
##########
@@ -0,0 +1,110 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.fs.s3a;
+
+import org.junit.After;
+import org.junit.Assert;
+import org.junit.Before;
+import org.junit.Rule;
+import org.junit.Test;
+import org.junit.rules.Timeout;
+
+import java.io.File;
+import java.io.IOException;
+import java.nio.file.Files;
+import java.util.regex.Pattern;
+
+/**
+ * Unit tests for {@link S3AFileSystem}.
+ */
+public class TestS3AFileSystem extends Assert {
+ final File TEMP_DIR = new File("target/build/test/TestS3AFileSystem");
+ final String longStr = // 1024 char
+ "very_long_s3_key__very_long_s3_key__very_long_s3_key__very_long_s3_key__"
+
+
"very_long_s3_key__very_long_s3_key__very_long_s3_key__very_long_s3_key__" +
+
"very_long_s3_key__very_long_s3_key__very_long_s3_key__very_long_s3_key__" +
+
"very_long_s3_key__very_long_s3_key__very_long_s3_key__very_long_s3_key__" +
+
"very_long_s3_key__very_long_s3_key__very_long_s3_key__very_long_s3_key__" +
+
"very_long_s3_key__very_long_s3_key__very_long_s3_key__very_long_s3_key__" +
+
"very_long_s3_key__very_long_s3_key__very_long_s3_key__very_long_s3_key__" +
+
"very_long_s3_key__very_long_s3_key__very_long_s3_key__very_long_s3_key__" +
+
"very_long_s3_key__very_long_s3_key__very_long_s3_key__very_long_s3_key__" +
+
"very_long_s3_key__very_long_s3_key__very_long_s3_key__very_long_s3_key__" +
+
"very_long_s3_key__very_long_s3_key__very_long_s3_key__very_long_s3_key__" +
+
"very_long_s3_key__very_long_s3_key__very_long_s3_key__very_long_s3_key__" +
+
"very_long_s3_key__very_long_s3_key__very_long_s3_key__very_long_s3_key__" +
+
"very_long_s3_key__very_long_s3_key__very_long_s3_key__very_long_s3_key__" +
+ "very_long_s3_key";
+ final String longStrTruncated = "very_long_s3_key__very_long_s3_key__";
+
+ @Rule
+ public Timeout testTimeout = new Timeout(30 * 1000);
+
+ @Before
+ public void init() throws IOException {
+ Files.createDirectories(TEMP_DIR.toPath());
Review Comment:
use `S3ATestUtils` code here for handling of parallel test cases and ease of
future maintenance
```
Configuration conf = new Configuration(false)
S3ATestUtils.prepareTestConfiguration(conf)
conf.get(BUFFER_DIR)
```
##########
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java:
##########
@@ -1369,12 +1369,69 @@ public S3AEncryptionMethods getS3EncryptionAlgorithm() {
File createTmpFileForWrite(String pathStr, long size,
Configuration conf) throws IOException {
initLocalDirAllocatorIfNotInitialized(conf);
- Path path = directoryAllocator.getLocalPathForWrite(pathStr,
- size, conf);
+ Path path = directoryAllocator.getLocalPathForWrite(pathStr, size, conf);
File dir = new File(path.getParent().toUri().getPath());
- String prefix = path.getName();
- // create a temp file on this directory
- return File.createTempFile(prefix, null, dir);
+ return safeCreateTempFile(pathStr, null, dir);
+ }
+
+ // TODO remove this method when hadoop upgrades to a newer version of java
than 1.8
+ /**
+ * Ensure that the temp file prefix and suffix don't exceed the maximum
number of characters
+ * allowed by the underlying file system. This validation isn't required in
Java 9+ since
+ * {@link java.io.File#createTempFile(String, String, File)} automatically
truncates file names.
+ *
+ * @param prefix prefix for the temporary file
+ * @param suffix suffix for the temporary file
+ * @param dir directory to create the temporary file in
+ * @return a unique temporary file
+ * @throws IOException
+ */
+ static File safeCreateTempFile(String prefix, String suffix, File dir)
throws IOException
+ {
+ // avoid validating multiple times.
+ // if the jvm running is version 9+ then defer to java.io.File validation
implementation
+ if(Float.parseFloat(System.getProperty("java.class.version")) >= 53) {
+ return File.createTempFile(prefix, null, dir);
+ }
+
+ // if no suffix was defined assume the default
+ if(suffix == null) {
+ suffix = ".tmp";
+ }
+ // Use only the file name from the supplied prefix
+ prefix = (new File(prefix)).getName();
+
+ int prefixLength = prefix.length();
+ int suffixLength = suffix.length();
+ int maxRandomSuffixLen = 19; //
Long.toUnsignedString(Long.MAX_VALUE).length()
+
+ String name;
+ int nameMax = 255; // unable to access the underlying FS directly, so
assume 255
+ int excess = prefixLength + maxRandomSuffixLen + suffixLength - nameMax;
+
+ // shorten the prefix length if the file name exceeds 255 chars
+ if (excess > 0) {
+ // Attempt to shorten the prefix length to no less than 3
+ prefixLength = shortenSubName(prefixLength, excess, 3);
Review Comment:
again, make a constant and use in both places
##########
hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/ITestS3ABlockOutputArray.java:
##########
@@ -79,6 +82,46 @@ public void testRegularUpload() throws IOException {
verifyUpload("regular", 1024);
}
+ /**
+ * Test that the DiskBlock's local file doesn't result in error when the S3
key exceeds the max
+ * char limit of the local file system. Currently
+ * {@link java.io.File#createTempFile(String, String, File)} is being relied
on to handle the
Review Comment:
this comment is out of date
> Improve S3ABlockOutputStream recovery
> -------------------------------------
>
> Key: HADOOP-18706
> URL: https://issues.apache.org/jira/browse/HADOOP-18706
> Project: Hadoop Common
> Issue Type: Improvement
> Components: fs/s3
> Reporter: Chris Bevard
> Assignee: Chris Bevard
> Priority: Minor
> Labels: pull-request-available
>
> If an application crashes during an S3ABlockOutputStream upload, it's
> possible to complete the upload if fast.upload.buffer is set to disk by
> uploading the s3ablock file with putObject as the final part of the multipart
> upload. If the application has multiple uploads running in parallel though
> and they're on the same part number when the application fails, then there is
> no way to determine which file belongs to which object, and recovery of
> either upload is impossible.
> If the temporary file name for disk buffering included the s3 key, then every
> partial upload would be recoverable.
> h3. Important disclaimer
> This change does not directly add the Syncable semantics which applications
> that require {{Syncable.hsync()}} to only return after all pending data has
> been durably written to the destination path. S3 is not a filesystem and this
> change does not make it so.
> What is does do is assist anyone trying to implement some post-crash recovery
> process which
> # interrogates s3 to identofy pending uploads to a specific path and get a
> list of uploaded blocks yet to be committed
> # scans the local fs.s3a.buffer dir directories to identify in-progress-write
> blocks for the same target destination. That is those which were being
> uploaded, queued for uploaded and the single "new data being written to"
> block for an output stream
> # uploads all those pending blocks
> # generates a new POST to complete a multipart upload with all the blocks in
> the correct order
> All this patch does is ensure the buffered block filenames include the final
> path and block ID, to aid in identify which blocks need to be uploaded and
> what order.
> h2. warning
> causes HADOOP-18744 -always include the relevant fix when backporting
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]