alamb commented on issue #888: URL: https://github.com/apache/arrow-datafusion/issues/888#issuecomment-916816150
Hi @sum12 -- > it looks like the columns definitions are required. I am not sure about this -- I think the following works (note there are no column definitions). I was imagining we would do something similar for CSV ```sql CREATE EXTERNAL TABLE something STORED AS PARQUET LOCATION '/Users/alamb/Downloads/demo.parquet'; ``` > Also if we are inferring the schema then the API currently needs to read a set of records to actually do the inference. Do we want to control the number of rows read (default is to read the entire file) ? I think having a setting (in https://docs.rs/datafusion/5.0.0/datafusion/execution/context/struct.ExecutionConfig.html) would be helpful. I think reading a thousand rows is probably a good default (as most CSV files will have easily detectable schemas in their initial rows if at all) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
