[GitHub] [arrow] mbignotti opened a new issue, #37008: FIFO Queue

via GitHub Thu, 03 Aug 2023 01:13:35 -0700


mbignotti opened a new issue, #37008:
URL: https://github.com/apache/arrow/issues/37008


   ### Describe the usage question you have. Please include as many useful 
details as  possible.
   
   
   Hi!
   I'm relatively new to Arrow, hence I apologies if the question is not 100% 
clear. 
   I've posted 
[here](https://stackoverflow.com/questions/76785928/fifo-queue-with-polars-dataframe)
 what I'm trying to achieve, but I feel this is probably easier to solve with 
PyArrow tools. 
   In essence, I'm looking for the most efficient way to implement a FIFO queue 
with Arrow data, where the size of the queue is defined in terms of number of 
rows in a table.
   
   Here is a trivial example with Tables:
   ```python
   import pyarrow as pa
   import pyarrow.compute as pc
   
   # Define the maximum number of rows in the buffer
   buffer_size = 4
   
   # Initialize the buffer with some data
   buffer = pa.table([pa.array([1,2,3,4]), pa.array([11,12,13,14])], 
names=['a', 'b'])
   
   # A new table arrives. This needs to be appended to the buffer.
   new_table = pa.table([pa.array([5,6]), pa.array([15,16])], names=['a', 'b'])
   
   # Append new data to the buffer
   buffer = pa.concat_tables([buffer, new_table])
   
   # Update the buffer such that it respects the max number of rows allowed.
   buffer_len = len(buffer)
   buffer = pc.take(buffer, pa.array([buffer_len-i-1 for i in 
reversed(range(buffer_size))])) # Not sure there's an easier way to select the 
last n rows
   
   # Continue...
   ``` 
   
   Of course, incoming data must respect the schema.
   
   However, I'm pretty sure the approach showed above is not the most efficient 
one. 
   Can we use Arrow buffers to achieve the same result? Or maybe something else?
   
   Thanks a lot!
   
   ### Component(s)
   
   Python


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] mbignotti opened a new issue, #37008: FIFO Queue

Reply via email to