[GitHub] [arrow] martindut opened a new issue #8505: open_dataset on folder with gzip files

GitBox Thu, 22 Oct 2020 04:35:49 -0700


martindut opened a new issue #8505:
URL: https://github.com/apache/arrow/issues/8505



   Hi
   I'm trying to run open_dataset on a folder that contains tsv files that are 
gzipped.
   `ds <- open_dataset(.path, format = "tsv", delim = "\t", schema = aschema)`
   which returns
   FileSystemDataset with 24 csv files
   valuationdate: int32
   CGTClientID: int32
   CGTInstrumentID: int32
   AiaRecType: int32
   ParcelID: int32
   n: int32
   AiaAdjustAmt: float
   mindate: int32
   maxdate: int32
   
   However, if I call collect on the dataset, I get this error
   
   > Error in dataset___Scanner__ToTable(self) : 
   >   Invalid: Could not open CSV input source 
'C:/inndx/investmentaccountingdata/snapshot/aiaparcelsumm/obelix/v1.0/curo/TPA_UnitTrust/2020/01/06/135844/curo_[TPA_UnitTrust]_20200103_135844.gz':
 Invalid: CSV parse error: Expected 1 columns, got 2
   
   I can open a individual file with 
   ```
   a_df <- read_tsv_arrow(
     file = .file,
     schema = rschema,
     col_names = TRUE,
     skip_empty_rows = TRUE,
     as_data_frame = FALSE
   )
   ```
   and it works perfectly.
   
   Please advise if I'm doing something wrong here


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] martindut opened a new issue #8505: open_dataset on folder with gzip files

Reply via email to