The GitHub Actions job "Required Checks" on 
texera.git/gh-readonly-queue/main/pr-5027-7bd65500562724eced01d87c254fdab1e9474e7b
 has failed.
Run started by GitHub user Yicong-Huang (triggered by Yicong-Huang).

Head commit for run:
469e3874c9fb0b167ca190bd57078622488cb721 / nathant27 <[email protected]>
fix(amber): avoids false negative on _check_heartbeat when connection is 
established but close raises exception (#5027)

<!--
Thanks for sending a pull request (PR)! Here are some tips for you:
1. If this is your first time, please read our contributor guidelines:
[Contributing to
Texera](https://github.com/apache/texera/blob/main/CONTRIBUTING.md)
  2. Ensure you have added or run the appropriate tests for your PR
  3. If the PR is work in progress, mark it a draft on GitHub.
  4. Please write your PR title to summarize what this PR proposes, we 
    are following Conventional Commits style for PR titles as well.
  5. Be sure to keep the PR description updated to reflect all changes.
-->

### What changes were proposed in this PR?
<!--
Please clarify what changes you are proposing. The purpose of this
section
is to outline the changes. Here are some tips for you:
  1. If you propose a new API, clarify the use case for a new API.
  2. If you fix a bug, you can clarify why it is a bug.
  3. If it is a refactoring, clarify what has been changed.
  3. It would be helpful to include a before-and-after comparison using 
     screenshots or GIFs.
  4. Please consider writing useful notes for better and faster reviews.
-->
Old version of Heartbeat._check_heartbeat(self) wraps the socket
connection and socket close in a single try except, and handles with the
same try except. The issue is that if close raises after a successful
connection, there is a false negative since it will handle it as if
socket connection had failed. In Heartbeat.run(self), this could cause
the server to shutdown on close raising exception since _check_heartbeat
will return False in this case.
New version separates the connection handling and the close in two
adjacent try except blocks, and returns true as long as connection is
successfully established as well, getting rid of the false negative.

As per discussed in the issue, will keep the double check for
_check_heartbeat in Heartbeat.run.

Old Version:
```
    def _check_heartbeat(self) -> bool:
        """
        Attempt to connect to JVM on the specific port. If succeeds, it means 
the
        socket is still available and the JVM is still alive. Otherwise, the JVM
        might have been gone.

        :return: bool, indicating if the socket is available.
        """
        try:
            temp_socket = socket.create_connection(
                (self._parsed_server_host, self._parsed_server_port), timeout=1
            )
            temp_socket.close()
            return True
        except Exception as e:
            logger.warning(f"Server is down with exception: {e}")
            return False
```

New Version:
```
    def _check_heartbeat(self) -> bool:
        """
        Attempt to connect to the JVM port. If connection failure, then JVM is 
dead, or if connection success 
        then JVM is alive even if close() raises. Logs on connection failure 
and on close error.

        :return: bool, True if connect succeeded, False if connect failed.
        """
        try:
            temp_socket = socket.create_connection(
                (self._parsed_server_host, self._parsed_server_port), timeout=1
            )
        except Exception as e:
            logger.warning(f"Server is down with exception: {e}")
            return False

        try:
            temp_socket.close()
        except Exception as e:
            logger.warning(f"Failed to close socket: {e}")
        
        return True
```

### Any related issues, documentation, discussions?
Closes #4726 

### How was this PR tested?
Changed old test to reflect new behavior(True on connection succeeds but
socket close rasises)
OLD TEST(REMOVED):
```
    def test_returns_false_when_socket_close_raises(self):
        # Pins the false-negative path: connect succeeds but the subsequent
        # close() throws (e.g. broken pipe on a half-open socket). The bare
        # `except Exception` in _check_heartbeat() catches it and reports
        # "server down", and a regression that narrows that handler would be
        # caught here.
        hb = make_heartbeat()
        fake_sock = MagicMock()
        fake_sock.close.side_effect = OSError("close failed")
        with patch(
            "core.runnables.heartbeat.socket.create_connection",
            return_value=fake_sock,
        ):
            assert hb._check_heartbeat() is False
```
NEW TEST:
```
    def 
test_returns_true_when_connection_succeeds_but_socket_close_raises(self):
        hb = make_heartbeat()
        fake_sock = MagicMock()
        fake_sock.close.side_effect = OSError("close failed")
        with patch(
            "core.runnables.heartbeat.socket.create_connection",
            return_value=fake_sock,
        ):
            assert hb._check_heartbeat() is True
```
Passes on existing old tests

### Was this PR authored or co-authored using generative AI tooling?
No

Report URL: https://github.com/apache/texera/actions/runs/26591490617

With regards,
GitHub Actions via GitBox

Reply via email to