thisisnic opened a new pull request, #12839: URL: https://github.com/apache/arrow/pull/12839
As discussed on #12826 Not sure how (if) to write tests but tried running it locally using the CSV directory set up in `test-dataset-csv.R` with and without this change, and without it, we get, e.g. ``` open_dataset(csv_dir) # Error in `handle_parquet_io_error()` at r/R/dataset.R:221:6: # ! Invalid: Error creating dataset. Could not read schema from '/tmp/RtmpuTyOD8/file5049dcf581a5/5/file1.csv': Could not open Parquet input source '/tmp/RtmpuTyOD8/file5049dcf581a5/5/file1.csv': Parquet magic bytes not found in footer. Either the file is corrupted or this is not a parquet file. # /home/nic2/arrow/cpp/src/arrow/dataset/file_parquet.cc:323 GetReader(source, scan_options). Is this a 'parquet' file? # /home/nic2/arrow/cpp/src/arrow/dataset/discovery.cc:40 InspectSchemas(std::move(options)) # /home/nic2/arrow/cpp/src/arrow/dataset/discovery.cc:262 Inspect(options.inspect_options) # ℹ Did you mean to specify a 'format' other than the default (parquet)? ``` and then with it: ``` open_dataset(csv_dir) # Error in `open_dataset()`: # ! Invalid: Error creating dataset. Could not read schema from '/tmp/RtmpLbqZs6/file4e4ca14fb5795/5/file1.csv': Could not open Parquet input source '/tmp/RtmpLbqZs6/file4e4ca14fb5795/5/file1.csv': Parquet magic bytes not found in footer. Either the file is corrupted or this is not a parquet file. # /home/nic2/arrow/cpp/src/arrow/dataset/file_parquet.cc:323 GetReader(source, scan_options). Is this a 'parquet' file? # /home/nic2/arrow/cpp/src/arrow/dataset/discovery.cc:40 InspectSchemas(std::move(options)) # /home/nic2/arrow/cpp/src/arrow/dataset/discovery.cc:262 Inspect(options.inspect_options) # ℹ Did you mean to specify a 'format' other than the default (parquet)? ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
