[GitHub] [arrow-datafusion] alamb commented on issue #888: [DataFusion CLI] Support querying CSV files without providing the schema

GitBox Fri, 10 Sep 2021 03:57:16 -0700


alamb commented on issue #888:
URL: 
https://github.com/apache/arrow-datafusion/issues/888#issuecomment-916816150



   Hi @sum12  -- 
   
   > it looks like the columns definitions are required. 
   
   I am not sure about this -- I think the following works (note there are no 
column definitions). I was imagining we would do something similar for CSV
   
   ```sql
   CREATE EXTERNAL TABLE something STORED AS PARQUET LOCATION 
'/Users/alamb/Downloads/demo.parquet';
   ```
   
   > Also if we are inferring the schema then the API currently needs to read a 
set of records to actually do the inference. Do we want to control the number 
of rows read (default is to read the entire file) ?
   
   I think having a setting (in 
https://docs.rs/datafusion/5.0.0/datafusion/execution/context/struct.ExecutionConfig.html)
 would be helpful. I think reading a thousand rows is probably a good default 
(as most CSV files will have easily detectable schemas in their initial rows if 
at all)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-datafusion] alamb commented on issue #888: [DataFusion CLI] Support querying CSV files without providing the schema

Reply via email to