zifengyu commented on PR #14158: URL: https://github.com/apache/arrow/pull/14158#issuecomment-1325893750
This feature is exactly what we need to adapt Acero. I tried to add ExecBatch ordering and implemented the limit operator in our product. Here is what we saw in the tests. 1. It seems a little difficult to finish the node (and notify downstream node) as the input / output batch counts are not the same. In our case, the finish may happen either when having the limit number of rows or upstream node is finished producing (but not generated limit rows). The former occurs in Queue's deliver task while latter occurs in FetchNode's InputFinished. We did not find an easy way to sync these two components so we moved the queue part inside node and added a counter to track sent rows. 2. We also need the `offset` setting to skip the first a few rows in the limit operator. Can this be included in FetchNode so we may switch back to Acero node in future? Anyway, this proposal is critical to our using Acero. We are looking forward to its release. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
