thisisnic commented on issue #38456: URL: https://github.com/apache/arrow/issues/38456#issuecomment-1782360840
The thing is though, they're almost but not directly interchangeable. There are some params in readr::read_csv which arrow doesn't have, and reasons why folks might want both in 1 script. And while you may think it's best practice to namespace read/write function calls (I also do this for clarity in my scripts) it's not something which is necessarily widely accepted/done, and we can't assume everyone else is doing that. An example of where you might want both is needing to read in a CSV as an Arrow Table, but the CSV not being valid (a missing comma on one row perhaps); readr::read_csv allows you to read in just 1 row of data from any point in the file (arrow doesn't work quite like this), so you can use the error output from arrow to work out which row to get readr to read in so you can inspect it. I mean, the benefit of arrow is being able to work with Arrow objects when that's needed (larger than memory data, tight control of data types, we have some file reading options that readr does not); while our API could be cleaner, I'm not sure that it follows logically that that fact negates those benefits :) Maybe the distinction here is more subtle; whereas you could view the file reading functions in e.g. data.table and readr as different frameworks where one is less likely to have a mix of both in a script, I'd consider readr and arrow as much more complementary. It all depends on what angle you're coming at it from - if you recall, in the posit::conf workshop, the focus was on `open_dataset()` and mych less on `arrow::read_*`, as ultimately, the former is where you get the most benefit of working with arrow. There are a whole load of API changes which would make sense, but I don't personally have time and they're lower priority to me than bugfixes and docs. However, if you were to open "enhancement request" issues for individual changes, with the reasoning there too (as opposed to asking a question, which prompts a different kind of conversation/interaction), then we can have a wider conversation there and if there's no reason *not* to make that change, you're welcome to submit a PR. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
