Rachelint commented on PR #11758:
URL: https://github.com/apache/datafusion/pull/11758#issuecomment-2283653639

   > > @JasonLi-cn As I think, `GroupValues` impls maybe should not care about 
the `batch size`? And we just do the `split and merge` work in the 
`GroupedHashAggregateStream::poll` , if unfortunately, the `batch size != block 
size` (usually they will equal)?
   > > Maybe we should impl the special block based `GroupValues` impls like 
following?
   > > 
   > > * We pass the `block size` when initializing it
   > > * It manage the inner values block by block
   > > * It return all blocks with internal `block size`
   > >   We can always make the `block size == batch size`, so we can totally 
avoid any split operators.
   > > 
   > > I am making a try about it in #11943 , and have done some related code 
changes.
   > 
   > OK. How do we determine the value of block size?
   
   
   
   > > @JasonLi-cn As I think, `GroupValues` impls maybe should not care about 
the `batch size`? And we just do the `split and merge` work in the 
`GroupedHashAggregateStream::poll` , if unfortunately, the `batch size != block 
size` (usually they will equal)?
   > > Maybe we should impl the special block based `GroupValues` impls like 
following?
   > > 
   > > * We pass the `block size` when initializing it
   > > * It manage the inner values block by block
   > > * It return all blocks with internal `block size`
   > >   We can always make the `block size == batch size`, so we can totally 
avoid any split operators.
   > > 
   > > I am making a try about it in #11943 , and have done some related code 
changes.
   > 
   > OK. How do we determine the value of block size?
   
   I think maybe we make it equal to `batch_size` in most cases, and so that we 
can avoid any split operations during producing output? And for the cornercase, 
for example, the `batch_size` is too small, we can let it fallback to single 
block mode?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to