timsaucer opened a new issue, #749:
URL: https://github.com/apache/datafusion-python/issues/749

   **Is your feature request related to a problem or challenge? Please describe 
what you are trying to do.**
   
   Per discussion in the datafusion python discord channel, some users feel 
that the `datafusion-python` project is not "pythonic". Some, such as myself, 
have found it necessary to dig through the rust documentation to discover how 
to use the features that are currently exposed. Some classes and functions have 
documentation available, but most do not. For example, see the [API page for 
functions](https://datafusion.apache.org/python/generated/datafusion.functions.functions.html).
 Here is one randomly selected entry from that page:
   
   ```
   datafusion.functions.functions.approx_distinct(*args, distinct=False)
   ```
   
   As a user, the only way right now to understand both what `args` can and 
must be passed or to understand the utility of this function is to dig into the 
rust code, either in this repo or the `datafusion` repo.
   
   Additionally, from the point of view of a python user who wants to look at 
the list of functions that are generated, there is no easy way to do this from 
the repository itself. One can look at the online documentation as linked 
above. However many users like to clone the repo and look through the code 
themselves. It can be obscure to python users who are unfamiliar with rust 
procedural macros how we generate and expose functions and classes. For these 
users, looking into the `python/datafusion` directory within this repo is not 
helpful.
   
   **Describe the solution you'd like**
   
   Similar to the approach used by the [polars 
project](https://github.com/pola-rs/polars), it would be nice to have wrappers 
for the functions and classes that our end users interact with. I have 
identified two down sides to doing this. It will add an additional step for the 
developer to expose a new function and it will increase the number of calls. 
The benefit is that the repository will be much more user friendly to python 
developers.
   
   **Describe alternatives you've considered**
   
   An alternative approach is to use `.pyi` files inside the 
`python/datafusion` directory as started in [this 
repo](https://github.com/3ok/datafusion-stubs/tree/main/datafusion-stubs). 
These `pyi` serve a similar purpose to what I have described above. They have 
the advantage of removing the additional function call that a wrapper 
introduces. The down side to using the `pyi` file approach is that there are no 
guarantees that the `pyi` files are kept up to date with the underlying code. 
Function parameters may change as the code evolves and if the user does not 
update these `pyi` files we will have documentation that is out of sync with 
the underlying code. By using wrapper libraries, if these parameters change 
they will ideally be caught by the unit level tests.
   
   **Additional context**
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to