Lunderberg commented on code in PR #10909:
URL: https://github.com/apache/tvm/pull/10909#discussion_r843993790


##########
python/tvm/contrib/hexagon/build.py:
##########
@@ -360,10 +361,6 @@ def _copy_binaries(self):
     def _run_server_script(self):
         """Setup the ADB connection and execute the server script."""
 
-        # Removed pre-defined forward/reverse rules
-        subprocess.check_call(self._adb_device_sub_cmd + ["forward", 
"--remove-all"])

Review Comment:
   Do we want to clean up port forwards that have been opened for more than N 
days?  The `.stop_server()` method may not be called if a segfault occurs, and 
that may leave ports open unnecessarily.



##########
python/tvm/contrib/hexagon/build.py:
##########
@@ -386,21 +384,46 @@ def _run_server_script(self):
             stderr=subprocess.PIPE,
         )
 
+    def _cleanup_port_forwarding(self):
+        # Removed pre-defined forward/reverse rules
+        rpc_tracker_port = self._rpc_info["rpc_tracker_port"]
+        subprocess.check_call(
+            self._adb_device_sub_cmd + ["reverse", "--remove", 
f"tcp:{rpc_tracker_port}"]
+        )
+        for port in self.forwarded_ports_:
+            subprocess.check_call(self._adb_device_sub_cmd + ["forward", 
"--remove", f"tcp:{port}"])
+
+    def _terminate_remote(self):
+        # Send interupt to main and child processes
+        SIGINT = 2
+        subprocess.Popen(
+            self._adb_device_sub_cmd
+            + ["shell", f"pkill -l{SIGINT} -P `cat 
{self._workspace}/rpc_pid.txt`"]
+        )
+        subprocess.Popen(
+            self._adb_device_sub_cmd
+            + ["shell", f"kill -s sigint `cat {self._workspace}/rpc_pid.txt`"]
+        )
+        subprocess.Popen(
+            self._adb_device_sub_cmd
+            + ["shell", f"kill -s sigint `cat {self._workspace}/rpc_pid.txt`"]
+        )
+        # Wait for processes to destruct cleanly after receiving the intrupt
+        subprocess.Popen(self._adb_device_sub_cmd + ["shell", "sleep", "0.1s"])
+        # Kill process children
+        self._adb_device_sub_cmd + ["shell", f"pkill -P `cat 
{self._workspace}/rpc_pid.txt`"]

Review Comment:
   I wonder if we could set the process group for the subprocesses launched by 
`PopenWorker`.  Those are the ones that tend to be left-over.  The `pkill -P` 
command would send signals to the direct children of the main process, but not 
to any grandchildren (e.g. main process starts a worker subprocess, which 
starts a compilation subprocess), or to any processes started after the child 
process is killed but before the parent process is killed (e.g. main process 
interprets the closing child as a failed test case, starts a new test case).  
It [looks like](https://pymotw.com/2/subprocess/#process-groups-sessions) it 
would be relatively straightfoward to have `Popen` calls started as the same 
process group, so that they would all receive the signal simultaneously.



##########
python/tvm/contrib/hexagon/build.py:
##########
@@ -386,21 +384,46 @@ def _run_server_script(self):
             stderr=subprocess.PIPE,
         )
 
+    def _cleanup_port_forwarding(self):
+        # Removed pre-defined forward/reverse rules
+        rpc_tracker_port = self._rpc_info["rpc_tracker_port"]
+        subprocess.check_call(
+            self._adb_device_sub_cmd + ["reverse", "--remove", 
f"tcp:{rpc_tracker_port}"]
+        )
+        for port in self.forwarded_ports_:
+            subprocess.check_call(self._adb_device_sub_cmd + ["forward", 
"--remove", f"tcp:{port}"])
+
+    def _terminate_remote(self):
+        # Send interupt to main and child processes
+        SIGINT = 2
+        subprocess.Popen(
+            self._adb_device_sub_cmd
+            + ["shell", f"pkill -l{SIGINT} -P `cat 
{self._workspace}/rpc_pid.txt`"]

Review Comment:
   I think pkill also supports named signals, so we can pass `-l SIGINT` 
instead of needing the string format.



##########
python/tvm/contrib/hexagon/build.py:
##########
@@ -373,9 +370,10 @@ def _run_server_script(self):
         )
         # Enable port forward for RPC server. We forward 9 ports after the 
rpc_server_port.
         for i in range(0, 10):
+            port = rpc_server_port + i
+            self.forwarded_ports_.append(port)
             subprocess.check_call(
-                self._adb_device_sub_cmd
-                + ["forward", f"tcp:{rpc_server_port+i}", 
f"tcp:{rpc_server_port+i}"]
+                self._adb_device_sub_cmd + ["forward", f"tcp:{port}", 
f"tcp:{port}"]

Review Comment:
   If a port forward already exists, does `adb` exit with a non-zero return 
code?  If the `adb` call exits with a zero return code in that case, then there 
are two processes trying to start port forwards on overlapping ranges would 
result in double clean-up occurring.



##########
python/tvm/contrib/hexagon/build.py:
##########
@@ -373,9 +370,10 @@ def _run_server_script(self):
         )
         # Enable port forward for RPC server. We forward 9 ports after the 
rpc_server_port.
         for i in range(0, 10):
+            port = rpc_server_port + i
+            self.forwarded_ports_.append(port)
             subprocess.check_call(
-                self._adb_device_sub_cmd
-                + ["forward", f"tcp:{rpc_server_port+i}", 
f"tcp:{rpc_server_port+i}"]
+                self._adb_device_sub_cmd + ["forward", f"tcp:{port}", 
f"tcp:{port}"]

Review Comment:
   Also, I think we should move the `self.forwarded_ports_.append(port)` to be 
after the subprocess call.  That way, a cleanup that occurs as the stack 
unwinds (e.g. [the `hexagon_launcher` 
fixture](https://github.com/apache/tvm/blob/main/tests/python/contrib/test_hexagon/conftest.py#L184))
 wouldn't try to clean up the failed port forward.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to