elgabbas commented on issue #34291: URL: https://github.com/apache/arrow/issues/34291#issuecomment-1442191856
Thanks @eitsupi ... This did not help in my case. Loading the data consumed high memory and crashed my PC. One possible solution is to loop through values of one of the columns, filter the data based on this value, then save to disk manually for each value.. Will apply this and see ``` arrow::open_dataset(sources = Path, format = "csv", delim = "\t", quote = "") %>% # Some filtering %>% arrow::write_dataset(path = OutPath, max_open_files = 100L, max_rows_per_file = 1000L, format = "arrow") ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org