wjones127 commented on issue #33986: URL: https://github.com/apache/arrow/issues/33986#issuecomment-1546424922
I have updated the document and created a rough sketch. I've also notified some devs from other projects, such as PyIceberg and dask-deltatable, to get more feedback. Basically, I think the API that we have now for Datasets is actually very good. So doing as Chang originally suggested and just making a `typing.Protocol` out of it seems like it would be sufficient. **I think that's what we want, but I'm honestly not 100% sure the best way to expose / publish this, so I would welcome feedback on that.** There are some possible extensions of it that could be made in the future, but I don't think they should block us from defining a protocol now. IMO, this is a good opportunity to define something that will work well enough for now. I don't think it will be something that will last the next 5-10 years. But what we learn from pushing this API to it's limits may inform us on the design of something that's more robust and includes input from a much wider part of the PyData ecosystem. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
