nicki-dese commented on issue #45300: URL: https://github.com/apache/arrow/issues/45300#issuecomment-2608496717
Thank you @amoeba, both for your info and offer of a chat. I have done a lot of exploring of arrow, and more recently duckdb. For the majority of our work, targets plus data.table with interim outputs saved as parquet via targets has worked really well. We often start our targets pipeline with arrow::open_dataset to filter our data before bringing it in to memory, which has been a game changer. However, open_dataset's schema inference from csvs is much worse than both fread and duckdb's, which has stopped our whole-hearted adoption. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
