thisisnic commented on issue #45601:
URL: https://github.com/apache/arrow/issues/45601#issuecomment-2908121045

   Thanks for trying it out and confirming the file format!  Another 
alternative temporary solution which may be quicker than the `map_batches()` 
solution would be to rewrite the dataset without the labels. 
   
   ```
   open_dataset(whatever) %>%
      mutate(col = cast(col, int32()) %>%
      write_dataset(newlocation)
   
   open_dataset(newlocation) %>%
     filter(col > 3) %>%
     collect()
   ```
   
   Again, I'm thinking of this as a workaround - what would be nice would be if 
we could just operate on the underlying storage type (i.e. an integer), but 
this is a much bigger design decision.  Just dropping the labels is a bit 
problematic as it messes up roundtrip fidelity (i.e some people might want to 
be able to read and write the data without dropping the labels).  Will keep 
iterating on a solution!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to