[ 
https://issues.apache.org/jira/browse/ARROW-17783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17607418#comment-17607418
 ] 

Weston Pace commented on ARROW-17783:
-------------------------------------

Hm, the [format 
page|https://arrow.apache.org/docs/format/Columnar.html#buffer-alignment-and-padding]
 suggests it is not only a C++ thing:

{quote}
Implementations are recommended to allocate memory on aligned addresses 
(multiple of 8- or 64-bytes) and pad (overallocate) to a length that is a 
multiple of 8 or 64 bytes. *When serializing Arrow data for interprocess 
communication, these alignment and padding requirements are enforced.* If 
possible, we suggest that you prefer using 64-byte alignment and padding. 
Unless otherwise noted, padded bytes do not need to have a specific value.
{quote}

That being said, I guess this is just another feature request.  It should be 
solvable with special head/tail handling.  And perhaps, since the C data API is 
not technically "interprocess communication", we can't rely on this anyways.

> [C++] Aggregate kernel should not mandate alignment
> ---------------------------------------------------
>
>                 Key: ARROW-17783
>                 URL: https://issues.apache.org/jira/browse/ARROW-17783
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: C++
>    Affects Versions: 6.0.0, 8.0.0
>            Reporter: Yifei Yang
>            Assignee: Weston Pace
>            Priority: Major
>         Attachments: flight-alignment-test.zip
>
>
> When using arrow's aggregate kernel with table transferred from arrow flight 
> (DoGet), it may crash at arrow::util::CheckAlignment(). However using 
> original data it works well, also if I first serialize the transferred table 
> into bytes then recreate an arrow table using the bytes, it works well.
> "flight-alignment-test" attached is the minimal test that can produce the 
> issue, which basically does "sum(total_revenue) group by l_suppkey" using the 
> table from "DoGet()". ("DummyNode" is just used to be the producer of the 
> aggregate node as the producer is required to create the aggregate node)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to