Robert Nishihara created ARROW-2308:
---------------------------------------
Summary: Serialized tensor data should be 64-byte aligned.
Key: ARROW-2308
URL: https://issues.apache.org/jira/browse/ARROW-2308
Project: Apache Arrow
Issue Type: Improvement
Components: Python
Reporter: Robert Nishihara
See [https://github.com/ray-project/ray/issues/1658] for an example of this
issue. Non-aligned data can trigger a copy when fed into TensorFlow and things
like that.
{code}
import pyarrow as pa
import numpy as np
x = np.zeros(10)
y = pa.deserialize(pa.serialize(x).to_buffer())
x.ctypes.data % 64 # 0 (it starts out aligned)
y.ctypes.data % 64 # 48 (it is no longer aligned)
{code}
It should be possible to fix this by calling something like
{{RETURN_NOT_OK(AlignStreamPosition(dst));}} before writing the array data.
Note that we already do this before writing the tensor header, but the tensor
header is not necessarily a multiple of 64 bytes, so the subsequent data can be
unaligned.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)