elgabbas commented on issue #34291:
URL: https://github.com/apache/arrow/issues/34291#issuecomment-1442191856

   Thanks @eitsupi ... This did not help in my case. Loading the data consumed 
high memory and crashed my PC. 
   
   One possible solution is to loop through values of one of the columns, 
filter the data based on this value, then save to disk manually for each 
value.. Will apply this and see
   ```
   arrow::open_dataset(sources = Path, format = "csv", delim = "\t", quote = 
"") %>% 
        # Some filtering  %>% 
       arrow::write_dataset(path = OutPath, max_open_files = 100L, 
max_rows_per_file = 1000L, format = "arrow")
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to