timsaucer opened a new issue, #749: URL: https://github.com/apache/datafusion-python/issues/749
**Is your feature request related to a problem or challenge? Please describe what you are trying to do.** Per discussion in the datafusion python discord channel, some users feel that the `datafusion-python` project is not "pythonic". Some, such as myself, have found it necessary to dig through the rust documentation to discover how to use the features that are currently exposed. Some classes and functions have documentation available, but most do not. For example, see the [API page for functions](https://datafusion.apache.org/python/generated/datafusion.functions.functions.html). Here is one randomly selected entry from that page: ``` datafusion.functions.functions.approx_distinct(*args, distinct=False) ``` As a user, the only way right now to understand both what `args` can and must be passed or to understand the utility of this function is to dig into the rust code, either in this repo or the `datafusion` repo. Additionally, from the point of view of a python user who wants to look at the list of functions that are generated, there is no easy way to do this from the repository itself. One can look at the online documentation as linked above. However many users like to clone the repo and look through the code themselves. It can be obscure to python users who are unfamiliar with rust procedural macros how we generate and expose functions and classes. For these users, looking into the `python/datafusion` directory within this repo is not helpful. **Describe the solution you'd like** Similar to the approach used by the [polars project](https://github.com/pola-rs/polars), it would be nice to have wrappers for the functions and classes that our end users interact with. I have identified two down sides to doing this. It will add an additional step for the developer to expose a new function and it will increase the number of calls. The benefit is that the repository will be much more user friendly to python developers. **Describe alternatives you've considered** An alternative approach is to use `.pyi` files inside the `python/datafusion` directory as started in [this repo](https://github.com/3ok/datafusion-stubs/tree/main/datafusion-stubs). These `pyi` serve a similar purpose to what I have described above. They have the advantage of removing the additional function call that a wrapper introduces. The down side to using the `pyi` file approach is that there are no guarantees that the `pyi` files are kept up to date with the underlying code. Function parameters may change as the code evolves and if the user does not update these `pyi` files we will have documentation that is out of sync with the underlying code. By using wrapper libraries, if these parameters change they will ideally be caught by the unit level tests. **Additional context** -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org