yyy1000 commented on issue #9672: URL: https://github.com/apache/arrow-datafusion/issues/9672#issuecomment-2015307720
I have some updates to share: the `with_column` implementation in Datafusion can't add a new column, it's the same as Spark's implementation, which says in https://spark.apache.org/docs/3.1.3/api/python/reference/api/pyspark.sql.DataFrame.withColumn.html > Returns a new [DataFrame](https://spark.apache.org/docs/3.1.3/api/python/reference/api/pyspark.sql.DataFrame.html#pyspark.sql.DataFrame) by adding a column or replacing the existing column that has the same name. > > The column expression must be an expression over this [DataFrame](https://spark.apache.org/docs/3.1.3/api/python/reference/api/pyspark.sql.DataFrame.html#pyspark.sql.DataFrame); attempting to add a column from some other [DataFrame](https://spark.apache.org/docs/3.1.3/api/python/reference/api/pyspark.sql.DataFrame.html#pyspark.sql.DataFrame) will raise an error. So using `unnest` is not a solution IMO. When I tried to implement a new method, I got stuck on how to retrieve the data from a dataframe. I think Dataframe in `Polars` is consists of some columns, see https://docs.rs/polars-core/0.38.3/src/polars_core/frame/mod.rs.html#134, and it looks more like a physical one. But dataframe in Datafusion is of a `LogicalPlan`, so I think it maybe different and looks more like a logical one? 🤔 Correct me if I'm wrong, I'm not very familiar with this. :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
