jllipatz commented on issue #34923:
URL: https://github.com/apache/arrow/issues/34923#issuecomment-1512547595

   
   Now it works. 58 GO used by the whole process.
   But two questions remains :
   
   - Why the initial program work when data come (slowly) from a RDS file, 
without adding the suggested calls?
   - Why data coming (quckly) from a parquet file take less memory than the one 
coming from a RDS file? It sounds as if the thing that seems to be a data.frame 
was hiding some trick as in 1:1000000 and ALTREP vectors instead of standard 
ones. 
   
   ```
   library(tictoc)
   dep <- rio::import('V:/PALETTES/IGoR/data/dep2014.dbf')
   
   tic()
   df <- arrow::read_parquet('V:/PALETTES/parquet/rp68a19.parquet')
   toc() # 785s
   
   
   tic()
   df <- arrow::as_arrow_table(df)
   toc() # 0.17s
   tic()
   df$REGION <- 
     factor(df$DR,levels=dep$DEP,labels=dep$REGION) |> 
     as.character()
   toc() # 17s
   
   tic()
   df <- arrow::as_arrow_table(df)
   toc() #0.02s
   
   tic()
   arrow::write_parquet(df,'V:/PALETTES/tmp/rp68a19a.parquet')
   toc() #388s
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to