JasperSch opened a new issue #11934:
URL: https://github.com/apache/arrow/issues/11934


   When writing a dateset to S3 as parquet files using `write_dataset`, I get 
download errors when retrieving the files afterwards.
   `Error: 'PAR1���2�2L���' does not exist in current working directory 
('/tmp/Rtmpk1pQuU'). `
   Despite of the errors, the files do however still get downloaded.
   The errors do not seem to occur when I use `write_dataset` locally and 
upload the files to s3 manually using ` aws.s3::put_object`.
   They also stop occurring if I re-upload the downloaded files.
   
   System info:
   
   R version 3.6.3 
   arrow 6.0.1
   aws.s3 0.3.21
   
   MWE:
   
   ```
   # You need an s3 backend to run this.
   bucket <- 'xxx'
   prefix <- 'yyy'
   
   data <- data.frame(
        x = letters[1:5]
       )
   
   arrow::write_dataset(
       dataset = data,
       path =  file.path(
           "s3:/",
           bucket,
           prefix,
           "test_parquet"))
   
   ref <- s3ObjectURI(store$bucket, c(prefix, "test_parquet/part-0.parquet"))
   aws.s3::save_object(
       object = ref,
       file = "test"
       )
   
   # Here an error is thrown, although the file is still downloaded without 
problems 
   # Error: 'PAR122L' does not exist in current working directory 
('/tmp/Rtmpk1pQuU'). 
       
   retrievedData <- dplyr::collect(arrow::open_dataset('test'))
   print(retrievedData)
   
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to