steveloughran commented on a change in pull request #2399:
URL: https://github.com/apache/hadoop/pull/2399#discussion_r522970306
##########
File path:
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/AbstractS3ACommitter.java
##########
@@ -1044,6 +1166,155 @@ protected void abortPendingUploads(
}
}
+ /**
+ * Scan for active uploads and list them along with a warning message.
+ * Errors are ignored.
+ * @param path output path of job.
+ */
+ protected void warnOnActiveUploads(final Path path) {
+ List<MultipartUpload> pending;
+ try {
+ pending = getCommitOperations()
+ .listPendingUploadsUnderPath(path);
+ } catch (IOException e) {
+ LOG.debug("Failed to list uploads under {}",
+ path, e);
+ return;
+ }
+ if (!pending.isEmpty()) {
+ // log a warning
+ LOG.warn("{} active upload(s) in progress under {}",
+ pending.size(),
+ path);
+ LOG.warn("Either jobs are running concurrently"
+ + " or failed jobs are not being cleaned up");
+ // and the paths + timestamps
+ DateFormat df = DateFormat.getDateTimeInstance();
+ pending.forEach(u ->
+ LOG.info("[{}] {}",
+ df.format(u.getInitiated()),
+ u.getKey()));
+ if (shouldAbortUploadsInCleanup()) {
+ LOG.warn("This committer will abort these uploads in job cleanup");
+ }
+ }
+ }
+
+ /**
+ * Build the job UUID.
+ *
+ * <p>
+ * In MapReduce jobs, the application ID is issued by YARN, and
+ * unique across all jobs.
+ * </p>
+ * <p>
+ * Spark will use a fake app ID based on the current time.
+ * This can lead to collisions on busy clusters.
+ *
+ * </p>
+ * <ol>
+ * <li>Value of
+ * {@link InternalCommitterConstants#FS_S3A_COMMITTER_UUID}.</li>
+ * <li>Value of
+ * {@link InternalCommitterConstants#SPARK_WRITE_UUID}.</li>
+ * <li>If enabled: Self-generated uuid.</li>
+ * <li>If not disabled: Application ID</li>
Review comment:
added the extra details
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]