proteetpaul-desco opened a new issue, #43334:
URL: https://github.com/apache/arrow/issues/43334

   ### Describe the enhancement requested
   
   We have a use-case to transfer filtered Arrow arrays, which can be chunked, 
from a CPP server to a Python client. The downstream flow in the Python client 
involves creation of NumPy arrays from the Arrow arrays, which necessitates 
flattening of the chunks in an Arrow chunked array. This flattening process, 
whether done on the client or server side, incurs the time and memory overhead 
of creating additional copies.
   
   As a solution, we propose enhancing the `RecordBatchWriter` class to 
optionally concatenate Arrow buffers during network transfer. This enhancement 
would allow the sending process to send multiple buffers sequentially over the 
network socket, while the receiving process would interpret these buffers as a 
single contiguous unit.
   
   Please let us know if this idea sounds agreeable. We are willing to 
implement the solution ourself.
   
   **Example code snippet:**
   <ins>Server (sender): [Replaced CPP code with python for simplicity]</ins>
   ```
   >> arr = pyarrow.array([1,2,3])
   >> chunked_arr = pyarrow.chunked_array([arr, arr])
   >> tbl = pyarrow.table([chunked_arr], names=('a'))
   >> options = pyarrow.ipc.IpcWriteOptions()
   >> options.unify_array_chunks = True   # Proposed IPC write option to enable 
unification of array chunks on the wire
   >> writer = pyarrow.RecordBatchStreamWriter(<stream>, tbl.schema, 
options=options)
   >> writer.write_table(tbl)
   ```
   
   <ins>Client (receiver):</ins>
   ```
   >> reader = pyarrow.RecordBatchStreamReader(<stream>)
   >> tbl = reader.read_all()
   >> tbl.columns[0]    # Should have a single chunk
   <pyarrow.lib.ChunkedArray object>
   [
     [
       1,
       2,
       3,
       1,
       2,
       3
     ]
   ]
   ```
   
   ### Component(s)
   
   C++


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to