westonpace commented on issue #14880: URL: https://github.com/apache/arrow/issues/14880#issuecomment-1341728017
> Thanks. Do you have any comments on the question about random chunking of rows and whether that offers any benefit? Or does partitioning a dataset only offer benefits if the partitioning is based on a sensible column? This will not modify the amount of memory used. > But this is seemingly just as slow as loading the full table. I had thought only ... columns would be read into memory so there would be a time savings. That should be the case. I believe this was broken in r-arrow 9. So if you are using version 9 you might try version 8 or 10. We always read CSV data in blocks (usually 4MB) and we should drop the data you don't need after reading it in. So even for CSV I would expect to see memory savings when using a select. I'm afraid I don't know R enough to know why that didn't work. That being said, parquet is always a good idea too :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
