jorgecarleitao commented on issue #906: URL: https://github.com/apache/arrow-datafusion/issues/906#issuecomment-913791972
It was named after spark's [CreateDataFrame](https://spark.apache.org/docs/3.1.1/api/python/reference/api/pyspark.sql.SparkSession.createDataFrame.html#pyspark-sql-sparksession-createdataframe) stylized for Python that does not use Camel Case for functions, but I was more focused in making UDFs and UDAFs work at zero copy at that point. I did not use "table" because there is no semantic difference between a "table" and a "dataframe", and "dataframe" ended up being the facto way of expressing "a programatic excel sheet" in Python, R, etc. I usually think of Python for ETL because the DataFrame API allows a more idiomatic way of expressing "chunks of SQL", testing of those chunks, etc, which is less prone to SQL injections than the typical `"SELECT * FROM {}".format(table_name)`, which is why I placed the DataFrame API as the core API of how to manage tables and leave the SQL _in Python_ for expressions, like Pandas does. I guess it depends what the target audience is, and so maybe both? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
