eddelbuettel opened a new issue, #386: URL: https://github.com/apache/arrow-nanoarrow/issues/386
The package contains examples of creating ArrayStream objects given a schema and a vector or list of arrays. That helps for chunks returned via, say, RecordBatchReader as this may not require the contiguous memory an unchunked approach would need. But as we instantiate with the whole vector (or list) we still require a similar total amount of memory at instantiation. But can we create, say, a RecordBatchReader is a more 'streaming' fashion? Could be hand this back to the caller with only the initially-known list of Arrays _and also support further data_ ? So say the first call of `next()` would be covered but thereafter a more 'lazy' approach is used and RecordBatchReader supplies updates in true batches. Obviously a more complicated setup, but is something like this feasible / supported / planned / ... ? I may be explaining myself poorly here but are there other references in the Arrow context that handle this is as a more 'open' subscription (in the sense of 'total payload unknown at instantiation') with a later callbacks to provide chunked updates? Or do I have the wrong mental model and should rather think about, say, a pub/sub model where a 'middle man' holds on to the data and passes is along? (I have done such things with Redis.) Thanks in advance for any pointers, and apologies for posting such a vague and rambling issue. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
