areusch commented on issue #8728:
URL: https://github.com/apache/tvm/issues/8728#issuecomment-912444859


   the current thinking on these is that it's potentially a QEMU-side issue 
that is triggered by running a bunch of tests in a row. this test started 
failing with the #8595 effort. I haven't been able to reproduce it locally in 
the same way it's failing in the CI. However, at one point during development, 
we were failing to kill the qemu process and thereby leaving ~10 
qemu-system-arm running. If you used a less powerful machine, you could 
reproduce a similar effect. 
   
   When I was able to reproduce this locally, I connected to the emulated ARM 
system via gdb and reset it. This timeout occurs while TVM is waiting to read 
an RPC reply from the emulated serial port. I was able to observe that, after 
resetting the emulated ARM system (while TVM was waiting for a reply), the 
rebooted ARM system could actually send UART data and TVM could read that data. 
However, the ARM system did not seem to be able to receive UART data from TVM.
   
   This leads me to suspect one of two root causes:
   1) a bug in the Zephyr firmware for MPS2_AN521 that may only manifest itself 
on QEMU
   2) a race condition in the Zephyr qemu UART emulation that fails to transmit 
data *to* the ARM system, but can read it back just fine.
   
   @gromero was investigating this before I went on PTO Aug 19. any updates? 
Perhaps we should consider temporarily disabling mps2_an521 until we can 
root-cause and fix this.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to