[
https://issues.apache.org/jira/browse/ARROW-399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15723667#comment-15723667
]
Wes McKinney commented on ARROW-399:
------------------------------------
Interesting point. I'm not sure what's the best thing to do there -- the idea
of the padding was to ensure that AVX512 instructions are always safe on the
whole buffer, so the padding can go in the metadata is simply be a part of the
in-memory and IPC format (something like "all buffers are padded to 64-bytes,
but the metadata reports only the used portion of the buffer"). Should discuss
in a separate JIRA
> [Java] ListVector.loadFieldBuffers ignores the ArrowFieldNode length metadata
> -----------------------------------------------------------------------------
>
> Key: ARROW-399
> URL: https://issues.apache.org/jira/browse/ARROW-399
> Project: Apache Arrow
> Issue Type: Bug
> Components: Java - Vectors
> Reporter: Wes McKinney
> Assignee: Julien Le Dem
> Priority: Blocker
> Attachments: list_error.json
>
>
> Discovered this during integration testing. Because Arrow-C++ writes buffers
> padded to 64 bytes, they may appear larger to the Java library than they need
> to be. In ListVector.loadFieldBuffers, the ArrowFieldNode is never used:
> {code:language=java}
> @Override
> public void loadFieldBuffers(ArrowFieldNode fieldNode, List<ArrowBuf>
> ownBuffers) {
> BaseDataValueVector.load(getFieldInnerVectors(), ownBuffers);
> }
> {code}
> The value count of the resulting ListVector is thus inferred from the size of
> the offsets buffer. In the case of a length-7 vector in C++, the size of the
> offsets buffer is exactly 64 bytes (padding for SIMD) -- Java infers from 64
> bytes that the value count is 15 (64 / 4 - 1), and the integration test fails.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)