Enno Shioji created HADOOP-11444:
------------------------------------
Summary: Jets3tFileSystemStore fails to remove initial slash from
object keys, resulting in objects with double forward slashes being stored
Key: HADOOP-11444
URL: https://issues.apache.org/jira/browse/HADOOP-11444
Project: Hadoop Common
Issue Type: Bug
Components: fs/s3
Affects Versions: 2.2.0
Environment: java version "1.7.0_71"
Java(TM) SE Runtime Environment (build 1.7.0_71-b14)
Java HotSpot(TM) 64-Bit Server VM (build 24.71-b01, mixed mode)
Reporter: Enno Shioji
Priority: Minor
While writing to S3 using Spark 1.2.0's ReceiverInputDStream#saveAsTextFiles
with a S3 URL ("s3://fake-test/1234"), I noticed that files are written with
double forward slashes (e.g. "s3://fake-test//1234/-1419334280000/").
After debugging, it seems this is caused by
Jets3tFileSystemStore#pathToKey(path), which returns "/fake-test/1234/..." for
the input "s3://fake-test/1234/...". when it should hack off the first forward
slash.
When I used a s3n URL and hence Jets3tNativeFileSystemStore, the double slashes
went away. Here are the comparison between their pathToKey implementation:
Jets3tNativeFileSystemStore's implementation of pathToKey is:
======
private static String pathToKey(Path path) {
if (path.toUri().getScheme() != null && path.toUri().getPath().isEmpty()) {
// allow uris without trailing slash after bucket to refer to root,
// like s3n://mybucket
return "";
}
if (!path.isAbsolute()) {
throw new IllegalArgumentException("Path must be absolute: " + path);
}
String ret = path.toUri().getPath().substring(1); // remove initial slash
if (ret.endsWith("/") && (ret.indexOf("/") != ret.length() - 1)) {
ret = ret.substring(0, ret.length() -1);
}
return ret;
}
======
whereas Jets3tFileSystemStore uses:
======
private String pathToKey(Path path) {
if (!path.isAbsolute()) {
throw new IllegalArgumentException("Path must be absolute: " + path);
}
return path.toUri().getPath();
}
======
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)