datapythonista opened a new pull request, #751: URL: https://github.com/apache/datafusion-python/pull/751
Currently, in the `datafusion` module there is no match between what `dir()` returns, and what's actually inside the module. For example: ```python >>> import datafusion >>> dir(datafusion) ['ABCMeta', 'Accumulator', 'AggregateUDF', 'Alias', 'Analyze', 'Between', 'Case', 'Cast', 'Config', 'CreateMemoryTable', 'CreateView', 'DFSchema', 'DataFrame', 'Distinct', 'DropTable', 'Exists', 'Explain', 'Expr', 'Extension', 'Filter', 'GroupingSet', 'ILike', 'InList', 'InSubquery', 'IsFalse', 'IsNotFalse', 'IsNotNull', 'IsNotTrue', 'IsNotUnknown', 'IsTrue', 'IsUnknown', 'Like', 'Limit', 'List', 'Negative', 'Not', 'Partitioning', 'Placeholder', 'Projection', 'Repartition', 'RuntimeConfig', 'SQLOptions', 'ScalarSubquery', 'ScalarUDF', 'ScalarVariable', 'SessionConfig', 'SessionContext', 'SimilarTo', 'Sort', 'Subquery', 'SubqueryAlias', 'TableScan', 'TryCast', 'Window', 'WindowFrame', '__all__', '__builtins__', '__cached__', '__doc__', '__file__', '__loader__', '__name__', '__package__', '__path__', '__spec__', '__version__', '_internal', 'abstractmethod', 'col', 'column', 'common', 'expr', 'importlib_metadata', 'lit', 'literal', 'pa', 'udaf', 'udf'] >>> datafusion.ABCMeta # <- THIS IS A STANDARD LIBRARY CLASS, NOT PART OF DATAFUSION <class 'abc.ABCMeta'> >>> dir(datafusion.common) # <- BASED ON THIS, THERE IS NOTHING IN THE MODULE EXCEPT THE MODULE ITSELF AGAIN ['__builtins__', '__cached__', '__doc__', '__file__', '__getattr__', '__loader__', '__name__', '__package__', '__spec__', 'common'] >>> datafusion.common.common <module 'common'> >>> datafusion.common.SqlTable # <- BUT WHEN USING THE MODULE THERE ARE ACTUALLY CLASSES <class 'datafusion.common.SqlTable'> ``` With this PR, I basically show what is part of the module, and I hide what's being displayed now as a side effect: ```python >>> import datafusion >>> dir(datafusion) ['Accumulator', 'AggregateUDF', 'Alias', 'Analyze', 'Between', 'Case', 'Cast', 'Config', 'CreateMemoryTable', 'CreateView', 'DFSchema', 'DataFrame', 'Distinct', 'DropTable', 'Exists', 'Explain', 'Expr', 'Extension', 'Filter', 'GroupingSet', 'ILike', 'InList', 'InSubquery', 'IsFalse', 'IsNotFalse', 'IsNotNull', 'IsNotTrue', 'IsNotUnknown', 'IsTrue', 'IsUnknown', 'Like', 'Limit', 'Negative', 'Not', 'Partitioning', 'Placeholder', 'Projection', 'Repartition', 'RuntimeConfig', 'SQLOptions', 'ScalarSubquery', 'ScalarUDF', 'ScalarVariable', 'SessionConfig', 'SessionContext', 'SimilarTo', 'Sort', 'Subquery', 'SubqueryAlias', 'TableScan', 'TryCast', 'Window', 'WindowFrame', '__all__', '__builtins__', '__cached__', '__doc__', '__file__', '__loader__', '__name__', '__package__', '__path__', '__spec__', '__version__', '_internal', 'col', 'column', 'common', 'expr', 'lit', 'literal', 'udaf', 'udf'] >>> dir(datafusion.common) ['DFSchema', 'DataType', 'DataTypeMap', 'PythonType', 'SqlFunction', 'SqlSchema', 'SqlStatistics', 'SqlTable', 'SqlType', 'SqlView', '__builtins__', '__cached__', '__dir__', '__doc__', '__file__', '__getattr__', '__loader__', '__name__', '__package__', '__spec__'] ``` Personally, instead of exposing just the `_internal` module from PyO3, I would create all these modules and submodules directly from PyO3. I'd use `_datafusion` as the main module, but instead of having a `common.py` file, the `_datafusion.common` submodule would be created directly from PyO3. So, all this magic of `__getattr__` and `__dir__` is not needed. @andygrove is there a reason why this wasn't implemented like this? Also, I don't think it's a good practice to have everything in the main namespace. I think it's fine that for examples users have to use `datafusion.expr.IsTrue` instead of `datafusion.IsTrue`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org