This is an automated email from the ASF dual-hosted git repository.
github-merge-queue[bot] pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/texera.git
The following commit(s) were added to refs/heads/main by this push:
new a854812861 fix(amber): avoids false negative on _check_heartbeat when
connection is established but close raises exception (#5027)
a854812861 is described below
commit a8548128615ca5e9402384bf0752328c7e39ab12
Author: nathant27 <[email protected]>
AuthorDate: Sat May 30 11:24:00 2026 -0700
fix(amber): avoids false negative on _check_heartbeat when connection is
established but close raises exception (#5027)
<!--
Thanks for sending a pull request (PR)! Here are some tips for you:
1. If this is your first time, please read our contributor guidelines:
[Contributing to
Texera](https://github.com/apache/texera/blob/main/CONTRIBUTING.md)
2. Ensure you have added or run the appropriate tests for your PR
3. If the PR is work in progress, mark it a draft on GitHub.
4. Please write your PR title to summarize what this PR proposes, we
are following Conventional Commits style for PR titles as well.
5. Be sure to keep the PR description updated to reflect all changes.
-->
### What changes were proposed in this PR?
<!--
Please clarify what changes you are proposing. The purpose of this
section
is to outline the changes. Here are some tips for you:
1. If you propose a new API, clarify the use case for a new API.
2. If you fix a bug, you can clarify why it is a bug.
3. If it is a refactoring, clarify what has been changed.
3. It would be helpful to include a before-and-after comparison using
screenshots or GIFs.
4. Please consider writing useful notes for better and faster reviews.
-->
Old version of Heartbeat._check_heartbeat(self) wraps the socket
connection and socket close in a single try except, and handles with the
same try except. The issue is that if close raises after a successful
connection, there is a false negative since it will handle it as if
socket connection had failed. In Heartbeat.run(self), this could cause
the server to shutdown on close raising exception since _check_heartbeat
will return False in this case.
New version separates the connection handling and the close in two
adjacent try except blocks, and returns true as long as connection is
successfully established as well, getting rid of the false negative.
As per discussed in the issue, will keep the double check for
_check_heartbeat in Heartbeat.run.
Old Version:
```
def _check_heartbeat(self) -> bool:
"""
Attempt to connect to JVM on the specific port. If succeeds, it
means the
socket is still available and the JVM is still alive. Otherwise,
the JVM
might have been gone.
:return: bool, indicating if the socket is available.
"""
try:
temp_socket = socket.create_connection(
(self._parsed_server_host, self._parsed_server_port),
timeout=1
)
temp_socket.close()
return True
except Exception as e:
logger.warning(f"Server is down with exception: {e}")
return False
```
New Version:
```
def _check_heartbeat(self) -> bool:
"""
Attempt to connect to the JVM port. If connection failure, then JVM
is dead, or if connection success
then JVM is alive even if close() raises. Logs on connection
failure and on close error.
:return: bool, True if connect succeeded, False if connect failed.
"""
try:
temp_socket = socket.create_connection(
(self._parsed_server_host, self._parsed_server_port),
timeout=1
)
except Exception as e:
logger.warning(f"Server is down with exception: {e}")
return False
try:
temp_socket.close()
except Exception as e:
logger.warning(f"Failed to close socket: {e}")
return True
```
### Any related issues, documentation, discussions?
Closes #4726
### How was this PR tested?
Changed old test to reflect new behavior(True on connection succeeds but
socket close rasises)
OLD TEST(REMOVED):
```
def test_returns_false_when_socket_close_raises(self):
# Pins the false-negative path: connect succeeds but the subsequent
# close() throws (e.g. broken pipe on a half-open socket). The bare
# `except Exception` in _check_heartbeat() catches it and reports
# "server down", and a regression that narrows that handler would be
# caught here.
hb = make_heartbeat()
fake_sock = MagicMock()
fake_sock.close.side_effect = OSError("close failed")
with patch(
"core.runnables.heartbeat.socket.create_connection",
return_value=fake_sock,
):
assert hb._check_heartbeat() is False
```
NEW TEST:
```
def
test_returns_true_when_connection_succeeds_but_socket_close_raises(self):
hb = make_heartbeat()
fake_sock = MagicMock()
fake_sock.close.side_effect = OSError("close failed")
with patch(
"core.runnables.heartbeat.socket.create_connection",
return_value=fake_sock,
):
assert hb._check_heartbeat() is True
```
Passes on existing old tests
### Was this PR authored or co-authored using generative AI tooling?
No
---
amber/src/main/python/core/runnables/heartbeat.py | 16 ++++++++++------
amber/src/test/python/core/runnables/test_heartbeat.py | 10 +++-------
2 files changed, 13 insertions(+), 13 deletions(-)
diff --git a/amber/src/main/python/core/runnables/heartbeat.py
b/amber/src/main/python/core/runnables/heartbeat.py
index 1e4c45837d..941782f699 100644
--- a/amber/src/main/python/core/runnables/heartbeat.py
+++ b/amber/src/main/python/core/runnables/heartbeat.py
@@ -79,22 +79,26 @@ class Heartbeat(Runnable, Stoppable):
def _check_heartbeat(self) -> bool:
"""
- Attempt to connect to JVM on the specific port. If succeeds, it means
the
- socket is still available and the JVM is still alive. Otherwise, the
JVM
- might have been gone.
+ Attempt to connect to the JVM port. Returns True iff
socket.create_connection succeeds;
+ a close() failure after a successful connection is logged but DOES NOT
flip the return value.
- :return: bool, indicating if the socket is available.
+ :return: bool, True if connect succeeded, False if connect failed.
"""
try:
temp_socket = socket.create_connection(
(self._parsed_server_host, self._parsed_server_port), timeout=1
)
- temp_socket.close()
- return True
except Exception as e:
logger.warning(f"Server is down with exception: {e}")
return False
+ try:
+ temp_socket.close()
+ except Exception as e:
+ logger.warning(f"Failed to close socket: {e}")
+
+ return True
+
@overrides
def stop(self):
# clean up every process under the python worker
diff --git a/amber/src/test/python/core/runnables/test_heartbeat.py
b/amber/src/test/python/core/runnables/test_heartbeat.py
index 76dc93071a..0fe5b321f5 100644
--- a/amber/src/test/python/core/runnables/test_heartbeat.py
+++ b/amber/src/test/python/core/runnables/test_heartbeat.py
@@ -79,12 +79,7 @@ class TestCheckHeartbeat:
):
assert hb._check_heartbeat() is False
- def test_returns_false_when_socket_close_raises(self):
- # Pins the false-negative path: connect succeeds but the subsequent
- # close() throws (e.g. broken pipe on a half-open socket). The bare
- # `except Exception` in _check_heartbeat() catches it and reports
- # "server down", and a regression that narrows that handler would be
- # caught here.
+ def
test_returns_true_when_connection_succeeds_but_socket_close_raises(self):
hb = make_heartbeat()
fake_sock = MagicMock()
fake_sock.close.side_effect = OSError("close failed")
@@ -92,7 +87,8 @@ class TestCheckHeartbeat:
"core.runnables.heartbeat.socket.create_connection",
return_value=fake_sock,
):
- assert hb._check_heartbeat() is False
+ assert hb._check_heartbeat() is True
+ fake_sock.close.assert_called_once()
class TestRunEarlyExit: