jllipatz commented on issue #34923:
URL: https://github.com/apache/arrow/issues/34923#issuecomment-1512547595
Now it works. 58 GO used by the whole process.
But two questions remains :
- Why the initial program work when data come (slowly) from a RDS file,
without adding the suggested calls?
- Why data coming (quckly) from a parquet file take less memory than the one
coming from a RDS file? It sounds as if the thing that seems to be a data.frame
was hiding some trick as in 1:1000000 and ALTREP vectors instead of standard
ones.
```
library(tictoc)
dep <- rio::import('V:/PALETTES/IGoR/data/dep2014.dbf')
tic()
df <- arrow::read_parquet('V:/PALETTES/parquet/rp68a19.parquet')
toc() # 785s
tic()
df <- arrow::as_arrow_table(df)
toc() # 0.17s
tic()
df$REGION <-
factor(df$DR,levels=dep$DEP,labels=dep$REGION) |>
as.character()
toc() # 17s
tic()
df <- arrow::as_arrow_table(df)
toc() #0.02s
tic()
arrow::write_parquet(df,'V:/PALETTES/tmp/rp68a19a.parquet')
toc() #388s
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]