westonpace opened a new pull request, #34060:
URL: https://github.com/apache/arrow/pull/34060

   This PR introduces the concept of `ExecBatch:index` but does not yet do much 
with it.  As a proof of concept this PR adds a fetch node which can be inserted 
anywhere in the plan (not just at the sink) to satisfy `LIMIT x OFFSET y` 
(Substrait calls this fetch and so I have also).
   
   This PR also introduces two sequencing accumulation queues which will be 
useful, I hope, for anyone implementing nodes that rely on ordered execution.
   
   This PR unfortunately introduces a new query option which is whether or not 
the sink node should pay the small performance hit required to sequence output. 
 While considering how best to add this option I realized we will probably have 
more query options in the near future regarding "how much RAM to use" (e.g. 
spillover) and potentially more beyond that.
   
   So I have taken all the options and put them into 
`arrow::compute::QueryOptions` (this already existed but it was not user facing 
and I added more things to it).  I added a new DeclarationToXyz overload that 
accepts QueryOptions.  This has, unfortunately, led to a bit of overload 
explosion but I think this should be the last new addition to the overload set 
(and we can deprecate the older overloads at some point).
   
   This PR also includes a new `gen::Gen / gen::TestGen` facility for 
generating test tables for input.  I'd like to eventually use this to simplify 
some of the existing exec plan tests as well.  I'm willing to split this into a 
separate PR if that makes sense.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to