ZihanLi58 commented on code in PR #3685:
URL: https://github.com/apache/gobblin/pull/3685#discussion_r1175743668
##########
gobblin-yarn/src/main/java/org/apache/gobblin/yarn/YarnAutoScalingManager.java:
##########
@@ -222,8 +223,22 @@ void runInternal() {
if (jobContext != null) {
log.debug("JobContext {} num partitions {}", jobContext,
jobContext.getPartitionSet().size());
-
inUseInstances.addAll(jobContext.getPartitionSet().stream().map(jobContext::getAssignedParticipant)
- .filter(Objects::nonNull).collect(Collectors.toSet()));
+ inUseInstances.addAll(jobContext.getPartitionSet().stream().map(i
-> {
+ if(jobContext.getPartitionState(i) == null) {
+ return jobContext.getAssignedParticipant(i);
+ }
+ if (!jobContext.getPartitionState(i).equals(
+ TaskPartitionState.ERROR) &&
!jobContext.getPartitionState(i).equals(
Review Comment:
here are some of my consideration
1. null check is to prevent the NPE exception when I try to compare the task
state with a specific value and try to return getPartitionState specifically to
maintain backward compatibility. But I'll refactor the code a little bit to
make it look clean
2. To ensure that logs are helpful and not noisy, I will reduce the amount
of information logged for retriable task states. Even if the instances are
added to the in-use map, they will be removed automatically during the next run
of the method as retry assigns them to new instances, causing old ones to be
removed automatically.
3. To address the issue of tasks failing multiple times, I will add a log
for tasks that have a high number of attempts. This will be clearer than
logging the unusual task state every time.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]