gandronchik commented on PR #2177:
URL: 
https://github.com/apache/arrow-datafusion/pull/2177#issuecomment-1118507143

   > @gandronchik thank you for the explanation in this PR's description. It 
helps though I will admit I still don't fully understand what is going o.
   > 
   > I agree with @doki23 -- I expect a table function to logically return a 
table (that something with both rows and columns)
   > 
   > > Regarding signature, I decided to use a single vector and vector with 
sizes of sections instead of vec of vecs to have better performance. If we use 
Vec, this will require a lot of memory in case of a request for millions of 
rows.
   > 
   > The way the rest of DataFusion avoids buffering all the intermediate 
results at once int memory is with `Stream`s but then that requires interacting 
with rust's `async` ecosystem which is non trivial
   > 
   > If you wanted a streaming solution, that would mean the signature might 
look something like the following (maybe)
   > 
   > ```rust
   > Arc<dyn Fn(Box<dyn SendableRecordBatchStream>) -> Result<Box<dyn 
SendableRecordBatchStream>> + Send + Sync>;
   > ```
   
   Looks like I got the title wrong. I have implemented a function that returns 
many rows, probably it is not a table function. If I rename it, will it be fine?
   
   Regarding the function signature, I think my solution is a compromise 
between vec<vec> and streaming. Actually, I don't think that function can 
return so many rows. However, of course, I will rewrite it if you want. So 
which solution do we choose: current `Result<(ArrayRef, Vec<usize>)> + Send + 
Sync>`, `Result<Vec<ColumnarValue>> + Send + Sync>` or `Result<Box< dyn 
SendableRecordBatchStream>> + Send + Sync>` ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to