Lunderberg commented on code in PR #10909:
URL: https://github.com/apache/tvm/pull/10909#discussion_r843993790
##########
python/tvm/contrib/hexagon/build.py:
##########
@@ -360,10 +361,6 @@ def _copy_binaries(self):
def _run_server_script(self):
"""Setup the ADB connection and execute the server script."""
- # Removed pre-defined forward/reverse rules
- subprocess.check_call(self._adb_device_sub_cmd + ["forward",
"--remove-all"])
Review Comment:
Do we want to clean up port forwards that have been opened for more than N
days? The `.stop_server()` method may not be called if a segfault occurs, and
that may leave ports open unnecessarily.
##########
python/tvm/contrib/hexagon/build.py:
##########
@@ -386,21 +384,46 @@ def _run_server_script(self):
stderr=subprocess.PIPE,
)
+ def _cleanup_port_forwarding(self):
+ # Removed pre-defined forward/reverse rules
+ rpc_tracker_port = self._rpc_info["rpc_tracker_port"]
+ subprocess.check_call(
+ self._adb_device_sub_cmd + ["reverse", "--remove",
f"tcp:{rpc_tracker_port}"]
+ )
+ for port in self.forwarded_ports_:
+ subprocess.check_call(self._adb_device_sub_cmd + ["forward",
"--remove", f"tcp:{port}"])
+
+ def _terminate_remote(self):
+ # Send interupt to main and child processes
+ SIGINT = 2
+ subprocess.Popen(
+ self._adb_device_sub_cmd
+ + ["shell", f"pkill -l{SIGINT} -P `cat
{self._workspace}/rpc_pid.txt`"]
+ )
+ subprocess.Popen(
+ self._adb_device_sub_cmd
+ + ["shell", f"kill -s sigint `cat {self._workspace}/rpc_pid.txt`"]
+ )
+ subprocess.Popen(
+ self._adb_device_sub_cmd
+ + ["shell", f"kill -s sigint `cat {self._workspace}/rpc_pid.txt`"]
+ )
+ # Wait for processes to destruct cleanly after receiving the intrupt
+ subprocess.Popen(self._adb_device_sub_cmd + ["shell", "sleep", "0.1s"])
+ # Kill process children
+ self._adb_device_sub_cmd + ["shell", f"pkill -P `cat
{self._workspace}/rpc_pid.txt`"]
Review Comment:
I wonder if we could set the process group for the subprocesses launched by
`PopenWorker`. Those are the ones that tend to be left-over. The `pkill -P`
command would send signals to the direct children of the main process, but not
to any grandchildren (e.g. main process starts a worker subprocess, which
starts a compilation subprocess), or to any processes started after the child
process is killed but before the parent process is killed (e.g. main process
interprets the closing child as a failed test case, starts a new test case).
It [looks like](https://pymotw.com/2/subprocess/#process-groups-sessions) it
would be relatively straightfoward to have `Popen` calls started as the same
process group, so that they would all receive the signal simultaneously.
##########
python/tvm/contrib/hexagon/build.py:
##########
@@ -386,21 +384,46 @@ def _run_server_script(self):
stderr=subprocess.PIPE,
)
+ def _cleanup_port_forwarding(self):
+ # Removed pre-defined forward/reverse rules
+ rpc_tracker_port = self._rpc_info["rpc_tracker_port"]
+ subprocess.check_call(
+ self._adb_device_sub_cmd + ["reverse", "--remove",
f"tcp:{rpc_tracker_port}"]
+ )
+ for port in self.forwarded_ports_:
+ subprocess.check_call(self._adb_device_sub_cmd + ["forward",
"--remove", f"tcp:{port}"])
+
+ def _terminate_remote(self):
+ # Send interupt to main and child processes
+ SIGINT = 2
+ subprocess.Popen(
+ self._adb_device_sub_cmd
+ + ["shell", f"pkill -l{SIGINT} -P `cat
{self._workspace}/rpc_pid.txt`"]
Review Comment:
I think pkill also supports named signals, so we can pass `-l SIGINT`
instead of needing the string format.
##########
python/tvm/contrib/hexagon/build.py:
##########
@@ -373,9 +370,10 @@ def _run_server_script(self):
)
# Enable port forward for RPC server. We forward 9 ports after the
rpc_server_port.
for i in range(0, 10):
+ port = rpc_server_port + i
+ self.forwarded_ports_.append(port)
subprocess.check_call(
- self._adb_device_sub_cmd
- + ["forward", f"tcp:{rpc_server_port+i}",
f"tcp:{rpc_server_port+i}"]
+ self._adb_device_sub_cmd + ["forward", f"tcp:{port}",
f"tcp:{port}"]
Review Comment:
If a port forward already exists, does `adb` exit with a non-zero return
code? If the `adb` call exits with a zero return code in that case, then there
are two processes trying to start port forwards on overlapping ranges would
result in double clean-up occurring.
##########
python/tvm/contrib/hexagon/build.py:
##########
@@ -373,9 +370,10 @@ def _run_server_script(self):
)
# Enable port forward for RPC server. We forward 9 ports after the
rpc_server_port.
for i in range(0, 10):
+ port = rpc_server_port + i
+ self.forwarded_ports_.append(port)
subprocess.check_call(
- self._adb_device_sub_cmd
- + ["forward", f"tcp:{rpc_server_port+i}",
f"tcp:{rpc_server_port+i}"]
+ self._adb_device_sub_cmd + ["forward", f"tcp:{port}",
f"tcp:{port}"]
Review Comment:
Also, I think we should move the `self.forwarded_ports_.append(port)` to be
after the subprocess call. That way, a cleanup that occurs as the stack
unwinds (e.g. [the `hexagon_launcher`
fixture](https://github.com/apache/tvm/blob/main/tests/python/contrib/test_hexagon/conftest.py#L184))
wouldn't try to clean up the failed port forward.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]