hterik commented on code in PR #31996:
URL: https://github.com/apache/airflow/pull/31996#discussion_r1234843006


##########
airflow/jobs/job.py:
##########
@@ -213,7 +213,11 @@ def heartbeat(
                 self.log.debug("[heartbeat]")
         except OperationalError:
             Stats.incr(convert_camel_to_snake(self.__class__.__name__) + 
"_heartbeat_failure", 1, 1)
-            self.log.exception("%s heartbeat got an exception", 
self.__class__.__name__)
+            if self.is_alive():
+                self.log.error("%s heartbeat failed with error. Scheduler may 
go into unhealthy state", self.__class__.__name__)
+            else:
+                self.log.error("%s heartbeat failed with error. Scheduler is 
in unhealthy state", self.__class__.__name__)

Review Comment:
   As a user of airflow reading the logs of a dag i would not understand what 
this means to me. Is this something i have to react to? Do i need to contact my 
admins? Is the dag results corrupted? Should i restart the scheduler? 
   
   This error isn't necessarily a problem with the scheduler. More often it is 
a problem of the executor not being able to reach the database, due to 
transient network problems. As long as this error is transient and recovers 
shortly, the consequence of this is usually none. The log message should 
reflect this.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to