[
https://issues.apache.org/jira/browse/ARROW-2308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17659335#comment-17659335
]
Rok Mihevc commented on ARROW-2308:
-----------------------------------
This issue has been migrated to [issue
#18260|https://github.com/apache/arrow/issues/18260] on GitHub. Please see the
[migration documentation|https://github.com/apache/arrow/issues/14542] for
further details.
> Serialized tensor data should be 64-byte aligned.
> -------------------------------------------------
>
> Key: ARROW-2308
> URL: https://issues.apache.org/jira/browse/ARROW-2308
> Project: Apache Arrow
> Issue Type: Improvement
> Components: Python
> Reporter: Robert Nishihara
> Assignee: Robert Nishihara
> Priority: Major
> Labels: pull-request-available
> Fix For: 0.10.0
>
>
> See [https://github.com/ray-project/ray/issues/1658] for an example of this
> issue. Non-aligned data can trigger a copy when fed into TensorFlow and
> things like that.
> {code}
> import pyarrow as pa
> import numpy as np
> x = np.zeros(10)
> y = pa.deserialize(pa.serialize(x).to_buffer())
> x.ctypes.data % 64 # 0 (it starts out aligned)
> y.ctypes.data % 64 # 48 (it is no longer aligned)
> {code}
> It should be possible to fix this by calling something like
> {{RETURN_NOT_OK(AlignStreamPosition(dst));}} before writing the array data.
> Note that we already do this before writing the tensor header, but the tensor
> header is not necessarily a multiple of 64 bytes, so the subsequent data can
> be unaligned.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)