Github user cresny commented on a diff in the pull request:
https://github.com/apache/flink/pull/2288#discussion_r74974171
--- Diff:
flink-streaming-java/src/main/java/org/apache/flink/streaming/util/HDFSCopyFromLocal.java
---
@@ -62,4 +64,34 @@ public void run() {
throw asyncException.f0;
}
}
+
+ /**
+ * Ensure that target path terminates with a new directory to be
created by fs. If remoteURI does not specify a new
+ * directory, append local directory name.
+ * @param fs
+ * @param localPath
+ * @param remoteURI
+ * @return
+ * @throws IOException
+ */
+ protected static URI checkInitialDirectory(final FileSystem fs,final
File localPath, final URI remoteURI) throws IOException {
+ if (localPath.isDirectory()) {
+ Path remotePath = new Path(remoteURI);
+ if (fs.exists(remotePath)) {
+ return new
Path(remotePath,localPath.getName()).toUri();
+ }
+ }
+ return remoteURI;
+ }
+
+ protected static void copyFromLocalFile(final FileSystem fs, final File
localPath, final URI remotePath) throws Exception {
--- End diff --
It looks like that won't work as-is since `flink-hadoop-compatability` is
not a dependency of either `flink-streaming-java` or `flink-yarn`. Maybe those
two are entirely disjoint? I'll look a litter deeper.
On another note-- the overloaded `FileSystem.copyFromLocal` used by Flink
defaults to "overwrite=false". This does not have much of an impact in HDFS,
but it carries real costs in S3, both from a performance and financial
standpoint, and because of this S3 writes are typically a blind overwrite.
Given the usage here -- savepoint backups and YARN staging -- it seems
apropiate for the general case to be an overwrite = true. For the HDFS case it
would incur a negligible check before write (rather than an optimistic write).
It's a very minor code change even if it may warrant a longer explanation.
Perhaps create a new issue?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---