[I] [R] read_parquet performs to slow [arrow]

via GitHub Thu, 05 Oct 2023 01:33:31 -0700


kostovasandra opened a new issue, #38032:
URL: https://github.com/apache/arrow/issues/38032


   ### Describe the bug, including details regarding any error messages, 
version, and platform.
   
   I have been using the read_parquet() function to read a file (compressed / 
uncompressed) from S3, but its too slow (700MB file to be read takes 10m). I 
tried setting the params:  arrow::set_cpu_count(2) & arrow.use_threads = FALSE, 
but still is slow.
   For writing the same file it takes 1-2m which still is not the best.
   Bellow is the code:
   s3_init <- function(){
     
     env = Sys.getenv(c("ECS_ACCESS_KEY_ID", 
"ECS_SECRET_ACCESS_KEY","ECS_S3_ENDPOINT_URL", "ECS_S3_BUCKET"))
     
     bucket <- arrow::s3_bucket(env["ECS_S3_BUCKET"],
                                access_key=env["ECS_ACCESS_KEY_ID"],
                                secret_key=env["ECS_SECRET_ACCESS_KEY"],
                                endpoint_override=env["ECS_S3_ENDPOINT_URL"],
                                region = '')
     
     return(bucket)
   }
    s3_save_rds <- function(file,s3_path){
     
     bucket = s3_init()
     write_parquet(file, bucket$path(s3_path), compression = "gzip")
   }
   
   s3_read_rds <- function(s3_path){
     
     bucket = s3_init()
     file = read_parquet(bucket$path(s3_path))
     return(file)
   }
   
   
   
   
   
   ### Component(s)
   
   R


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[I] [R] read_parquet performs to slow [arrow]

Reply via email to