Github user cresny commented on a diff in the pull request: https://github.com/apache/flink/pull/2288#discussion_r74974171 --- Diff: flink-streaming-java/src/main/java/org/apache/flink/streaming/util/HDFSCopyFromLocal.java --- @@ -62,4 +64,34 @@ public void run() { throw asyncException.f0; } } + + /** + * Ensure that target path terminates with a new directory to be created by fs. If remoteURI does not specify a new + * directory, append local directory name. + * @param fs + * @param localPath + * @param remoteURI + * @return + * @throws IOException + */ + protected static URI checkInitialDirectory(final FileSystem fs,final File localPath, final URI remoteURI) throws IOException { + if (localPath.isDirectory()) { + Path remotePath = new Path(remoteURI); + if (fs.exists(remotePath)) { + return new Path(remotePath,localPath.getName()).toUri(); + } + } + return remoteURI; + } + + protected static void copyFromLocalFile(final FileSystem fs, final File localPath, final URI remotePath) throws Exception { --- End diff -- It looks like that won't work as-is since `flink-hadoop-compatability` is not a dependency of either `flink-streaming-java` or `flink-yarn`. Maybe those two are entirely disjoint? I'll look a litter deeper. On another note-- the overloaded `FileSystem.copyFromLocal` used by Flink defaults to "overwrite=false". This does not have much of an impact in HDFS, but it carries real costs in S3, both from a performance and financial standpoint, and because of this S3 writes are typically a blind overwrite. Given the usage here -- savepoint backups and YARN staging -- it seems apropiate for the general case to be an overwrite = true. For the HDFS case it would incur a negligible check before write (rather than an optimistic write). It's a very minor code change even if it may warrant a longer explanation. Perhaps create a new issue?
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---