[
https://issues.apache.org/jira/browse/ARROW-2308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16417253#comment-16417253
]
ASF GitHub Bot commented on ARROW-2308:
---------------------------------------
wesm commented on issue #1802: ARROW-2308: [Python] Make deserialized numpy
arrays 64-byte aligned.
URL: https://github.com/apache/arrow/pull/1802#issuecomment-376865586
Will review this when I can. I should also revive ARROW-1860 as there are a
number of interrelated issues around this stuff
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
> Serialized tensor data should be 64-byte aligned.
> -------------------------------------------------
>
> Key: ARROW-2308
> URL: https://issues.apache.org/jira/browse/ARROW-2308
> Project: Apache Arrow
> Issue Type: Improvement
> Components: Python
> Reporter: Robert Nishihara
> Priority: Major
> Labels: pull-request-available
>
> See [https://github.com/ray-project/ray/issues/1658] for an example of this
> issue. Non-aligned data can trigger a copy when fed into TensorFlow and
> things like that.
> {code}
> import pyarrow as pa
> import numpy as np
> x = np.zeros(10)
> y = pa.deserialize(pa.serialize(x).to_buffer())
> x.ctypes.data % 64 # 0 (it starts out aligned)
> y.ctypes.data % 64 # 48 (it is no longer aligned)
> {code}
> It should be possible to fix this by calling something like
> {{RETURN_NOT_OK(AlignStreamPosition(dst));}} before writing the array data.
> Note that we already do this before writing the tensor header, but the tensor
> header is not necessarily a multiple of 64 bytes, so the subsequent data can
> be unaligned.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)