giuseros opened a new issue #5514:
URL: https://github.com/apache/incubator-tvm/issues/5514
Hi all,
I am running TVM from an Ubuntu 16.04 machine and I have the tracker running
on the same machine.
An aarch64 machine is connected to the tracker.
When running from the master branch, the following Python code:
```
remote = autotvm.measure.request_remote(device_key, device_tracker,
device_port, timeout=10000)
ctx = remote.cpu()
a = tvm.nd.array(np.ones((5041,720)).astype('float32'), ctx)
b = tvm.nd.array(np.ones((720,192)).astype('float32'), ctx)
```
Produces the following error on the server: `free(): invalid next size
(normal)`
On the host side, I get this error instead:
```
Traceback (most recent call last):
File "tvm/python/tvm/runtime/ndarray.py", line 503, in array
return empty(arr.shape, arr.dtype, ctx).copyfrom(arr)
File "tvm/python/tvm/runtime/ndarray.py", line 145, in copyfrom
check_call(_LIB.TVMArrayCopyFromBytes(self.handle, data, nbytes))
File "tvm/python/tvm/_ffi/base.py", line 330, in check_call
raise get_last_ffi_error()
tvm._ffi.base.TVMError: Traceback (most recent call last):
[bt] (7) tvm/build/libtvm.so(TVMArrayCopyFromBytes+0xa) [0x7f808df5397a]
[bt] (6) tvm/build/libtvm.so(tvm::runtime::ArrayCopyFromBytes(DLTensor*,
void const*, unsigned long)+0x7c4) [0x7f808df537c4]
[bt] (5)
tvm/build/libtvm.so(tvm::runtime::RPCDeviceAPI::CopyDataFromTo(void const*,
unsigned long, void*, unsigned long, unsigned long, DLContext, DLContext,
DLDataType, void*)+0x42f) [0x7f808df97e7f]
[bt] (4) tvm/build/libtvm.so(tvm::runtime::RPCSession::CopyToRemote(void*,
unsigned long, void*, unsigned long, unsigned long, DLContext,
DLDataType)+0x28f) [0x7f808df8400f]
[bt] (3)
tvm/build/libtvm.so(tvm::runtime::RPCSession::HandleUntilReturnEvent(tvm::runtime::TVMRetValue*,
bool, tvm::runtime::PackedFunc const*)+0x13f) [0x7f808df835ef]
[bt] (2) tvm/build/libtvm.so(+0xd3824c) [0x7f808df9024c]
[bt] (1) tvm/build/libtvm.so(tvm::support::Socket::Error(char
const*)+0x90) [0x7f808df85220]
[bt] (0)
tvm/build/libtvm.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x32)
[0x7f808d63dff2]
File "/workspace/src/runtime/rpc/../../support/socket.h", line 362
TVMError: Socket SockChannel::Recv Error:Connection reset by peer
```
I investigated the issue and found out that it is related to this commit:
https://github.com/apache/incubator-tvm/commit/afcf9397b60ae7ccf46601cf29828992ca9d5f57
I.e., the commit before that (i.e.,
9a8ed5b7abacfdb6a605f3ccd412fd929455fb15) works fine.
Any thoughts on what can be causing the issue?
I am cc'ing @jmorrill who is the author of the aforementioned PR.
Thanks,
Giuseppe
P.S. I also started a discuss post here:
https://discuss.tvm.ai/t/rpc-error-for-large-arrays/6591
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]