martindut opened a new issue #8505:
URL: https://github.com/apache/arrow/issues/8505
Hi
I'm trying to run open_dataset on a folder that contains tsv files that are
gzipped.
`ds <- open_dataset(.path, format = "tsv", delim = "\t", schema = aschema)`
which returns
FileSystemDataset with 24 csv files
valuationdate: int32
CGTClientID: int32
CGTInstrumentID: int32
AiaRecType: int32
ParcelID: int32
n: int32
AiaAdjustAmt: float
mindate: int32
maxdate: int32
However, if I call collect on the dataset, I get this error
> Error in dataset___Scanner__ToTable(self) :
> Invalid: Could not open CSV input source
'C:/inndx/investmentaccountingdata/snapshot/aiaparcelsumm/obelix/v1.0/curo/TPA_UnitTrust/2020/01/06/135844/curo_[TPA_UnitTrust]_20200103_135844.gz':
Invalid: CSV parse error: Expected 1 columns, got 2
I can open a individual file with
```
a_df <- read_tsv_arrow(
file = .file,
schema = rschema,
col_names = TRUE,
skip_empty_rows = TRUE,
as_data_frame = FALSE
)
```
and it works perfectly.
Please advise if I'm doing something wrong here
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]