[ 
https://issues.apache.org/jira/browse/ARROW-17836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17610137#comment-17610137
 ] 

Weston Pace commented on ARROW-17836:
-------------------------------------

Yes, the direct I/O PR offers a generic filesystem interface and thus has to 
potentially memcpy / buffer incoming data to satisfy alignment.  The difference 
with spilling is that we are already doing a memcpy higher up in the chain.  
When we spill, we take the data we need to spill and partition it.

For example, if we want to add spill to sorting, and we pretend we are sorting 
by date, and we've accumulated too much data we might then partition into 
decade sized buckets and persist to disk.  Then, once all the data has arrived, 
we can process a single decade at a time (with the hope that one decade of data 
is small enough to fit in memory).

That's a rough description, and there are corner cases, but the point is we 
already have to do a memcpy in order to handle the partitioning (partitioning 
is unfortunately a rather row-oriented operation) and so we want to go ahead 
and satisfy the alignment requirement at that point.  This way, when we are 
ready to spill, we don't have to worry about alignment and we can just use 
direct I/O without any extra memcpy.

> [C++] Allow specifying of alignment in MemoryPool's allocations 
> ----------------------------------------------------------------
>
>                 Key: ARROW-17836
>                 URL: https://issues.apache.org/jira/browse/ARROW-17836
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: C++
>            Reporter: Sasha Krassovsky
>            Assignee: Sasha Krassovsky
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 40m
>  Remaining Estimate: 0h
>
> For spilling, I need to create buffers that are 512-byte aligned. The task is 
> to augment MemoryPool to allow for specifying alignment explicitly when 
> allocating (but keep the default the same).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to