[
https://issues.apache.org/jira/browse/ARROW-17836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17610137#comment-17610137
]
Weston Pace commented on ARROW-17836:
-------------------------------------
Yes, the direct I/O PR offers a generic filesystem interface and thus has to
potentially memcpy / buffer incoming data to satisfy alignment. The difference
with spilling is that we are already doing a memcpy higher up in the chain.
When we spill, we take the data we need to spill and partition it.
For example, if we want to add spill to sorting, and we pretend we are sorting
by date, and we've accumulated too much data we might then partition into
decade sized buckets and persist to disk. Then, once all the data has arrived,
we can process a single decade at a time (with the hope that one decade of data
is small enough to fit in memory).
That's a rough description, and there are corner cases, but the point is we
already have to do a memcpy in order to handle the partitioning (partitioning
is unfortunately a rather row-oriented operation) and so we want to go ahead
and satisfy the alignment requirement at that point. This way, when we are
ready to spill, we don't have to worry about alignment and we can just use
direct I/O without any extra memcpy.
> [C++] Allow specifying of alignment in MemoryPool's allocations
> ----------------------------------------------------------------
>
> Key: ARROW-17836
> URL: https://issues.apache.org/jira/browse/ARROW-17836
> Project: Apache Arrow
> Issue Type: Improvement
> Components: C++
> Reporter: Sasha Krassovsky
> Assignee: Sasha Krassovsky
> Priority: Major
> Labels: pull-request-available
> Time Spent: 40m
> Remaining Estimate: 0h
>
> For spilling, I need to create buffers that are 512-byte aligned. The task is
> to augment MemoryPool to allow for specifying alignment explicitly when
> allocating (but keep the default the same).
--
This message was sent by Atlassian Jira
(v8.20.10#820010)