steveloughran commented on code in PR #3289:
URL: https://github.com/apache/hadoop/pull/3289#discussion_r891168186
##########
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/CommitConstants.java:
##########
@@ -222,13 +226,18 @@ private CommitConstants() {
/**
* Number of threads in committers for parallel operations on files
* (upload, commit, abort, delete...): {@value}.
+ * Two thread pools this size are created, one for the outer
+ * task-level parallelism, and one for parallel execution
+ * within tasks (POSTs to commit individual uploads)
+ * If the value is negative, it is inverted and then multiplied
+ * by the number of cores in the CPU.
*/
public static final String FS_S3A_COMMITTER_THREADS =
"fs.s3a.committer.threads";
/**
* Default value for {@link #FS_S3A_COMMITTER_THREADS}: {@value}.
*/
- public static final int DEFAULT_COMMITTER_THREADS = 8;
+ public static final int DEFAULT_COMMITTER_THREADS = -4;
Review Comment:
yeah, i was thinking about this feature. maybe we will go back to a fixed
number, just a larger one. prevents "surprises" on different deployments.
fixed to 32. larger, but not too large. another strategy would be to somehow
just pick up those in the s3a instance, but limit it to a given value. benefit:
recycling across jobs and tasks.
the reason they are out of that pool is just that the code for parallel
commit came from netflix and we never tried to merge
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]