giuseros opened a new issue #5514:
URL: https://github.com/apache/incubator-tvm/issues/5514


   Hi all,
   I am running TVM from an Ubuntu 16.04 machine and I have the tracker running 
on the same machine. 
   
   An aarch64 machine is connected to the tracker. 
   
   When running from the master branch, the following Python code: 
   ```
   remote = autotvm.measure.request_remote(device_key, device_tracker, 
device_port, timeout=10000)
   ctx = remote.cpu()
   a = tvm.nd.array(np.ones((5041,720)).astype('float32'), ctx)
   b = tvm.nd.array(np.ones((720,192)).astype('float32'), ctx)
   ```
   
   Produces the following error on the server: `free(): invalid next size 
(normal)`
   
   On the host side, I get this error instead:
   ```
   Traceback (most recent call last):
     File "tvm/python/tvm/runtime/ndarray.py", line 503, in array
       return empty(arr.shape, arr.dtype, ctx).copyfrom(arr)
   
     File "tvm/python/tvm/runtime/ndarray.py", line 145, in copyfrom
       check_call(_LIB.TVMArrayCopyFromBytes(self.handle, data, nbytes))
   
     File "tvm/python/tvm/_ffi/base.py", line 330, in check_call
       raise get_last_ffi_error()
   
   tvm._ffi.base.TVMError: Traceback (most recent call last):
     [bt] (7) tvm/build/libtvm.so(TVMArrayCopyFromBytes+0xa) [0x7f808df5397a]
     [bt] (6) tvm/build/libtvm.so(tvm::runtime::ArrayCopyFromBytes(DLTensor*, 
void const*, unsigned long)+0x7c4) [0x7f808df537c4]
     [bt] (5) 
tvm/build/libtvm.so(tvm::runtime::RPCDeviceAPI::CopyDataFromTo(void const*, 
unsigned long, void*, unsigned long, unsigned long, DLContext, DLContext, 
DLDataType, void*)+0x42f) [0x7f808df97e7f]
     [bt] (4) tvm/build/libtvm.so(tvm::runtime::RPCSession::CopyToRemote(void*, 
unsigned long, void*, unsigned long, unsigned long, DLContext, 
DLDataType)+0x28f) [0x7f808df8400f]
     [bt] (3) 
tvm/build/libtvm.so(tvm::runtime::RPCSession::HandleUntilReturnEvent(tvm::runtime::TVMRetValue*,
 bool, tvm::runtime::PackedFunc const*)+0x13f) [0x7f808df835ef]
     [bt] (2) tvm/build/libtvm.so(+0xd3824c) [0x7f808df9024c]
     [bt] (1) tvm/build/libtvm.so(tvm::support::Socket::Error(char 
const*)+0x90) [0x7f808df85220]
     [bt] (0) 
tvm/build/libtvm.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x32) 
[0x7f808d63dff2]
     File "/workspace/src/runtime/rpc/../../support/socket.h", line 362
   TVMError: Socket SockChannel::Recv Error:Connection reset by peer
   ```
   I investigated the issue and found out that it is related to this commit: 
https://github.com/apache/incubator-tvm/commit/afcf9397b60ae7ccf46601cf29828992ca9d5f57
   
   I.e., the commit before that (i.e., 
9a8ed5b7abacfdb6a605f3ccd412fd929455fb15) works fine. 
   
   Any thoughts on what can be causing the issue?
   
   I am cc'ing @jmorrill who is the author of the aforementioned PR. 
   
   Thanks,
   Giuseppe  
   
   P.S. I also started a discuss post here: 
https://discuss.tvm.ai/t/rpc-error-for-large-arrays/6591
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to