Re: [PR] Enable Dataframe to be converted into views which can be used in register_table [datafusion-python]

via GitHub Fri, 07 Mar 2025 05:15:51 -0800


timsaucer commented on code in PR #1016:
URL: 
https://github.com/apache/datafusion-python/pull/1016#discussion_r1985040308



##########
src/dataframe.rs:
##########
@@ -50,9 +52,79 @@ use crate::{
     expr::{sort_expr::PySortExpr, PyExpr},
 };
 
+// https://github.com/apache/datafusion-python/pull/1016#discussion_r1983239116
+// - we have not decided on the table_provider approach yet
+// this is an interim implementation
+#[pyclass(name = "TableProvider", module = "datafusion")]
+pub struct PyTableProvider {
+    provider: Arc<dyn TableProvider>,
+}
+
+impl PyTableProvider {
+    pub fn new(provider: Arc<dyn TableProvider>) -> Self {
+        Self { provider }
+    }
+
+    pub fn as_table(&self) -> PyTable {
+        let table_provider: Arc<dyn TableProvider> = self.provider.clone();
+        PyTable::new(table_provider)
+    }
+}
+
 /// A PyDataFrame is a representation of a logical plan and an API to compose 
statements.
 /// Use it to build a plan and `.collect()` to execute the plan and collect 
the result.
 /// The actual execution of a plan runs natively on Rust and Arrow on a 
multi-threaded environment.
+///
+/// # Methods
+///
+/// - `new`: Creates a new PyDataFrame.
+/// - `__getitem__`: Enable selection for `df[col]`, `df[col1, col2, col3]`, 
and `df[[col1, col2, col3]]`.
+/// - `__repr__`: Returns a string representation of the DataFrame.
+/// - `_repr_html_`: Returns an HTML representation of the DataFrame.
+/// - `describe`: Calculate summary statistics for a DataFrame.
+/// - `schema`: Returns the schema from the logical plan.
+/// - `into_view`: Convert this DataFrame into a Table that can be used in 
register_table. We have not finalized on PyTableProvider approach yet.
+/// - `select_columns`: Select columns from the DataFrame.
+/// - `select`: Select expressions from the DataFrame.
+/// - `drop`: Drop columns from the DataFrame.
+/// - `filter`: Filter the DataFrame based on a predicate.
+/// - `with_column`: Add a new column to the DataFrame.
+/// - `with_columns`: Add multiple new columns to the DataFrame.
+/// - `with_column_renamed`: Rename a column in the DataFrame.
+/// - `aggregate`: Aggregate the DataFrame based on group by and aggregation 
expressions.
+/// - `sort`: Sort the DataFrame based on expressions.
+/// - `limit`: Limit the number of rows in the DataFrame.
+/// - `collect`: Executes the plan, returning a list of `RecordBatch`es.

Review Comment:
   I don't think we need this whole section. The methods of the class should 
auto populate into the `help` and online documentation. Do you think this is 
necessary? It's another thing we'd have to manually maintain over time.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Re: [PR] Enable Dataframe to be converted into views which can be used in register_table [datafusion-python]

Reply via email to