[GitHub] [gobblin] ZihanLi58 commented on a diff in pull request #3685: [GOBBLIN-1822]Logging Abnormal Helix Task States

via GitHub Mon, 24 Apr 2023 13:19:33 -0700


ZihanLi58 commented on code in PR #3685:
URL: https://github.com/apache/gobblin/pull/3685#discussion_r1175743668



##########
gobblin-yarn/src/main/java/org/apache/gobblin/yarn/YarnAutoScalingManager.java:
##########
@@ -222,8 +223,22 @@ void runInternal() {
           if (jobContext != null) {
             log.debug("JobContext {} num partitions {}", jobContext, 
jobContext.getPartitionSet().size());
 
-            
inUseInstances.addAll(jobContext.getPartitionSet().stream().map(jobContext::getAssignedParticipant)
-                .filter(Objects::nonNull).collect(Collectors.toSet()));
+            inUseInstances.addAll(jobContext.getPartitionSet().stream().map(i 
-> {
+              if(jobContext.getPartitionState(i) == null) {
+                return jobContext.getAssignedParticipant(i);
+              }
+              if (!jobContext.getPartitionState(i).equals(
+                  TaskPartitionState.ERROR) && 
!jobContext.getPartitionState(i).equals(

Review Comment:
   here are some of my consideration
   1. null check is to prevent the NPE exception when I try to compare the task 
state with a specific value and try to return getPartitionState specifically to 
maintain backward compatibility. But I'll refactor the code a little bit to 
make it look clean
   2. To ensure that logs are helpful and not noisy, I will reduce the amount 
of information logged for retriable task states. Even if the instances are 
added to the in-use map, they will be removed automatically during the next run 
of the method as retry assigns them to new instances, causing old ones to be 
removed automatically.
   3. To address the issue of tasks failing multiple times, I will add a log 
for tasks that have a high number of attempts. This will be clearer than 
logging the unusual task state every time.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [gobblin] ZihanLi58 commented on a diff in pull request #3685: [GOBBLIN-1822]Logging Abnormal Helix Task States

Reply via email to