blongworth commented on issue #39912:
URL: https://github.com/apache/arrow/issues/39912#issuecomment-2438073930

   This issue is definitely related to data size. Breaking it up into groups 
smaller than 170M rows works with my data.
   
   Here's a working duckDB solution:
   
   ```
   con <- dbConnect(duckdb(), dbdir = "my-db.duckdb", read_only = FALSE)
   
   ds_filt <- ds_filt |> 
     to_duckdb(con = con) |> 
     group_by(timestamp) |> 
     mutate(duplicate = n()) |> 
     filter(duplicate == 1) |> 
     ungroup() |> 
     to_arrow()
   ```
   
   duckDB needs an on-disk store to finish this, at least on my 32GB mac.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to