thisisnic commented on issue #45601:
URL: https://github.com/apache/arrow/issues/45601#issuecomment-2907773208

   Hi @EinMaulwurf, I'm curious whether this temporary fix would work for you 
on the kind of datasets you're working with or if it's too slow?  You can use 
`map_batches()` to convert the labelled column to a data type that Arrow can 
work with, though it'll be slower as it's doing the conversion in R and not 
Arrow.  Here's an example with a small dataset.
   
   ```
   library(haven)
   library(arrow)
   library(tibble)
   library(dplyr)
   
   d <- tibble(
     a = labelled(x = 1:5, label = "example variable a"),
     b = labelled(x = 11:15, label = "example variable b")
   )
   
   tf <- tempfile()
   write_parquet(d, tf)
   library(arrow)
   open_dataset(tf) %>%
     map_batches(~mutate(., a = as.integer(a))) %>%   # remove labels
     filter(a > 3) %>%
     collect() %>%
     mutate(a = labelled(a, , label = "example variable a")) # restore labels 
on output data
     ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to