thisisnic commented on issue #38456:
URL: https://github.com/apache/arrow/issues/38456#issuecomment-1782360840

   The thing is though, they're almost but not directly interchangeable. There 
are some params in readr::read_csv which arrow doesn't have, and reasons why 
folks might want both in 1 script. And while you may think it's best practice 
to namespace read/write function calls (I also do this for clarity in my 
scripts) it's not something which is necessarily widely accepted/done, and we 
can't assume everyone else is doing that. 
   
   An example of where you might want both is needing to read in a CSV as an 
Arrow Table, but the CSV not being valid (a missing comma on one row perhaps); 
readr::read_csv allows you to read in just 1 row of data from any point in the 
file (arrow doesn't work quite like this), so you can use the error output from 
arrow to work out which row to get readr to read in so you can inspect it.
   
   I mean, the benefit of arrow is being able to work with Arrow objects when 
that's needed (larger than memory data, tight control of data types, we have 
some file reading options that readr does not); while our API could be cleaner, 
I'm not sure that it follows logically that that fact negates those benefits :) 
 
   
   Maybe the distinction here is more subtle; whereas you could view the file 
reading functions in e.g. data.table and readr as different frameworks where 
one is less likely to have a mix of both in a script, I'd consider readr and 
arrow as much more complementary.
   
   It all depends on what angle you're coming at it from - if you recall, in 
the posit::conf workshop, the focus was on `open_dataset()` and mych less on 
`arrow::read_*`, as ultimately, the former is where you get the most benefit of 
working with arrow.
   
   There are a whole load of API changes which would make sense, but I don't 
personally have time and they're lower priority to me than bugfixes and docs. 
However, if you were to open "enhancement request" issues for individual 
changes, with the reasoning there too (as opposed to asking a question, which 
prompts a different kind of conversation/interaction), then we can have a wider 
conversation there and if there's no reason *not* to make that change, you're 
welcome to submit a PR. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to