TatianaJin commented on issue #38676:
URL: https://github.com/apache/arrow/issues/38676#issuecomment-2309601610
> Ok, so two problems here.
>
> First: "cannot infer number of columns". This is because the file has no
newline at all. If you add a newline at the end, the error disappears. I wonder
if such files exist in the wild, but would be good to add support for them.
This issue still exists in `apache-arrow-15.0.0` (I built arrow from source
using this tag). The following code should reproduce the bug.
```cpp
#include <fstream>
#include <iostream>
#include <ostream>
#include <arrow/csv/api.h>
#include <arrow/filesystem/localfs.h>
int main() {
auto csv_file = "CSVReaderTest.csv";
{ // generate test file
std::ofstream ostream(csv_file);
std::string data = "a,b\n0,1";
// no new line at the end
ostream.write(data.data(), data.size());
ostream.close();
}
// options
auto read_options = arrow::csv::ReadOptions::Defaults();
// skip the header row as the file has column names, and we want to
generate column names by index.
read_options.skip_rows = 1;
read_options.autogenerate_column_names = true;
auto parse_options = arrow::csv::ParseOptions::Defaults();
auto convert_options = arrow::csv::ConvertOptions::Defaults();
auto arrow_fs = std::make_shared<::arrow::fs::LocalFileSystem>();
auto random_access_file = arrow_fs->OpenInputFile(csv_file).ValueOrDie();
// die on this statement
auto record_batch_reader =
arrow::csv::StreamingReader::Make(arrow::io::default_io_context(),
random_access_file,
read_options,
parse_options, convert_options)
.ValueOrDie();
std::cout << record_batch_reader->ToTable().ValueOrDie()->ToString() <<
std::endl;
return 0;
}
```
The outcome is like this:

I think the problem might be here in `ProcessHeader` (I tried to look into
the codes yet am still new)
https://github.com/apache/arrow/blob/51e9f70f94cd09a0a08196afdd2f4fc644666b5e/cpp/src/arrow/csv/reader.cc#L609
The block is actually final in this case but calling `Parse` indicates
`is_final` is false. The only data row is therefore aborted and we got the
problem `cannot infer number of columns`.
https://github.com/apache/arrow/blob/a61f4af724cd06c3a9b4abd20491345997e532c0/cpp/src/arrow/csv/parser.cc#L403
@jorisvandenbossche Please help look into this. Thanks!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]