[GitHub] [arrow-datafusion] jorgecarleitao commented on issue #906: [Python]: register custom datasource

GitBox Mon, 06 Sep 2021 10:55:04 -0700


jorgecarleitao commented on issue #906:
URL: 
https://github.com/apache/arrow-datafusion/issues/906#issuecomment-913791972



   It was named after spark's 
[CreateDataFrame](https://spark.apache.org/docs/3.1.1/api/python/reference/api/pyspark.sql.SparkSession.createDataFrame.html#pyspark-sql-sparksession-createdataframe)
 stylized for Python that does not use Camel Case for functions, but I was more 
focused in making UDFs and UDAFs work at zero copy at that point.
   
   I did not use "table" because there is no semantic difference between a 
"table" and a "dataframe", and "dataframe" ended up being the facto way of 
expressing "a programatic excel sheet" in Python, R, etc.
   
   I usually think of Python for ETL because the DataFrame API allows a more 
idiomatic way of expressing "chunks of SQL", testing of those chunks, etc, 
which is less prone to SQL injections than the typical `"SELECT * FROM 
{}".format(table_name)`, which is why I placed the DataFrame API as the core 
API of how to manage tables and leave the SQL _in Python_ for expressions, like 
Pandas does.
   
   I guess it depends what the target audience is, and so maybe both?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-datafusion] jorgecarleitao commented on issue #906: [Python]: register custom datasource

Reply via email to