paleolimbot commented on PR #43632:
URL: https://github.com/apache/arrow/pull/43632#issuecomment-2286856527
Echoing all the thanks to Weston for the detailed response!
I wonder if it is worth clarifying the goals and non-goals of this proposal.
In my mind, this is about rectifying two very different ways engines/APIs
operate (push vs. pull). I don't have much experience on the performance side,
but in the development time/lines-of-code side, trying to make a producer that
expects to push its output interact with a consumer that wants to pull is
expensive (the reverse is also true). This gets more and more complicated the
more times this mismatch is encountered in a pipeline.
I worry that in the quest for the best possible performance that we loose
any development time/lines-of-code advantage that a simpler approach might have
enabled! I also worry that an ABI that becomes too opinionated about how a
scanner should be implemented will still not be able to express other ("non
optimal"?) scanners that, for historical reasons (or because we were wrong
about what an optimal scanner looks like), don't work that way. I still think
that something like the original proposal (with clear, if imperfect,
expectations about what can or should happen in the callbacks) is *a* missing
piece (if not *the* missing piece).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]