EinMaulwurf commented on issue #45601:
URL: https://github.com/apache/arrow/issues/45601#issuecomment-2907906623

   Hi,
   
   thanks for working on this problem. My dataset was consisting of a single 
parquet file. I just ended up reading it into R completely 
(`arrow::read_parquet()`) and then removing all the labels 
(`haven::zap_labels()` and similar) before continuing work. I did not need the 
labels, they were just in the dataset, probably because it was exported from 
STATA where labels are more commonly used (or so I've heard).
   
   I also just tested your suggestion with `map_batches()`. It works, but is 
almost 100 times slower compared to the same dataset without labels with a 
"normal" pipeline without the `map_batches()`.
   
   For my case, I would prefer a solution where arrow just drops or ignores all 
labels (perhaps with a warning or something).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to