[I] 'Open' stream example ? [arrow-nanoarrow]

via GitHub Thu, 15 Feb 2024 13:10:37 -0800


eddelbuettel opened a new issue, #386:
URL: https://github.com/apache/arrow-nanoarrow/issues/386


   The package contains examples of creating ArrayStream objects given a schema 
and a vector or list of arrays.  That helps for chunks returned via, say, 
RecordBatchReader as this may not require the contiguous memory an unchunked 
approach would need.  But as we instantiate with the whole vector (or list) we 
still require a similar total amount of memory at instantiation.  
   
   But can we create, say, a RecordBatchReader is a more 'streaming' fashion?  
Could be hand this back to the caller with only the initially-known list of 
Arrays _and also support further data_ ?  So say the first call of `next()` 
would be covered but thereafter a more 'lazy' approach is used and 
RecordBatchReader supplies updates in true batches.  Obviously a more 
complicated setup, but is something like this feasible / supported / planned / 
... ?
   
   I may be explaining myself poorly here but are there other references in the 
Arrow context that handle this is as a more 'open' subscription (in the sense 
of 'total payload unknown at instantiation') with a later callbacks to provide 
chunked updates?   Or do I have the wrong mental model and should rather think 
about, say, a pub/sub model where a 'middle man' holds on to the data and 
passes is along?  (I have done such things with Redis.)
   
   Thanks in advance for any pointers, and apologies for posting such a vague 
and rambling issue.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[I] 'Open' stream example ? [arrow-nanoarrow]

Reply via email to