datapythonista opened a new pull request, #751:
URL: https://github.com/apache/datafusion-python/pull/751

   Currently, in the `datafusion` module there is no match between what `dir()` 
returns, and what's actually inside the module. For example:
   
   ```python
   >>> import datafusion
   
   >>> dir(datafusion)
   ['ABCMeta', 'Accumulator', 'AggregateUDF', 'Alias', 'Analyze', 'Between', 
'Case', 'Cast', 'Config', 'CreateMemoryTable', 'CreateView', 'DFSchema', 
'DataFrame', 'Distinct', 'DropTable', 'Exists', 'Explain', 'Expr', 'Extension', 
'Filter', 'GroupingSet', 'ILike', 'InList', 'InSubquery', 'IsFalse', 
'IsNotFalse', 'IsNotNull', 'IsNotTrue', 'IsNotUnknown', 'IsTrue', 'IsUnknown', 
'Like', 'Limit', 'List', 'Negative', 'Not', 'Partitioning', 'Placeholder', 
'Projection', 'Repartition', 'RuntimeConfig', 'SQLOptions', 'ScalarSubquery', 
'ScalarUDF', 'ScalarVariable', 'SessionConfig', 'SessionContext', 'SimilarTo', 
'Sort', 'Subquery', 'SubqueryAlias', 'TableScan', 'TryCast', 'Window', 
'WindowFrame', '__all__', '__builtins__', '__cached__', '__doc__', '__file__', 
'__loader__', '__name__', '__package__', '__path__', '__spec__', '__version__', 
'_internal', 'abstractmethod', 'col', 'column', 'common', 'expr', 
'importlib_metadata', 'lit', 'literal', 'pa', 'udaf', 'udf']
   
   >>> datafusion.ABCMeta  # <- THIS IS A STANDARD LIBRARY CLASS, NOT PART OF 
DATAFUSION
   <class 'abc.ABCMeta'>
   
   >>> dir(datafusion.common)  # <- BASED ON THIS, THERE IS NOTHING IN THE 
MODULE EXCEPT THE MODULE ITSELF AGAIN
   ['__builtins__', '__cached__', '__doc__', '__file__', '__getattr__', 
'__loader__', '__name__', '__package__', '__spec__', 'common']
   
   >>> datafusion.common.common
   <module 'common'>
   
   >>> datafusion.common.SqlTable  # <- BUT WHEN USING THE MODULE THERE ARE 
ACTUALLY CLASSES
   <class 'datafusion.common.SqlTable'>
   ```
   
   With this PR, I basically show what is part of the module, and I hide what's 
being displayed now as a side effect:
   
   ```python
   >>> import datafusion
   
   >>> dir(datafusion)
   ['Accumulator', 'AggregateUDF', 'Alias', 'Analyze', 'Between', 'Case', 
'Cast', 'Config', 'CreateMemoryTable', 'CreateView', 'DFSchema', 'DataFrame', 
'Distinct', 'DropTable', 'Exists', 'Explain', 'Expr', 'Extension', 'Filter', 
'GroupingSet', 'ILike', 'InList', 'InSubquery', 'IsFalse', 'IsNotFalse', 
'IsNotNull', 'IsNotTrue', 'IsNotUnknown', 'IsTrue', 'IsUnknown', 'Like', 
'Limit', 'Negative', 'Not', 'Partitioning', 'Placeholder', 'Projection', 
'Repartition', 'RuntimeConfig', 'SQLOptions', 'ScalarSubquery', 'ScalarUDF', 
'ScalarVariable', 'SessionConfig', 'SessionContext', 'SimilarTo', 'Sort', 
'Subquery', 'SubqueryAlias', 'TableScan', 'TryCast', 'Window', 'WindowFrame', 
'__all__', '__builtins__', '__cached__', '__doc__', '__file__', '__loader__', 
'__name__', '__package__', '__path__', '__spec__', '__version__', '_internal', 
'col', 'column', 'common', 'expr', 'lit', 'literal', 'udaf', 'udf']
   
   >>> dir(datafusion.common)
   ['DFSchema', 'DataType', 'DataTypeMap', 'PythonType', 'SqlFunction', 
'SqlSchema', 'SqlStatistics', 'SqlTable', 'SqlType', 'SqlView', '__builtins__', 
'__cached__', '__dir__', '__doc__', '__file__', '__getattr__', '__loader__', 
'__name__', '__package__', '__spec__']
   
   ```
   
   Personally, instead of exposing just the `_internal` module from PyO3, I 
would create all these modules and submodules directly from PyO3. I'd use 
`_datafusion` as the main module, but instead of having a `common.py` file, the 
`_datafusion.common` submodule would be created directly from PyO3. So, all 
this magic of `__getattr__` and `__dir__` is not needed. @andygrove is there a 
reason why this wasn't implemented like this?
   
   Also, I don't think it's a good practice to have everything in the main 
namespace. I think it's fine that for examples users have to use 
`datafusion.expr.IsTrue` instead of `datafusion.IsTrue`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to