thisisnic commented on pull request #12083:
URL: https://github.com/apache/arrow/pull/12083#issuecomment-1012889685


   Yeah, spot on, and there is an extra level of complexity added here due to 
the fact that the way in which we infer column names from a schema is a bit 
incorrect (but this is being dealt with as part of another issue)
   
   Are you happy with leaving the existing code which derives column names from 
a schema as it was before any of the changes in this PR, and as you say, raise 
an error if CsvReadOptions$create() is used for read_options but is not 
consistent with the schema?
   
   There's a PR which was just merged that does a similar thing but relates to 
partitioning, and is a great example of this kind of thing being done really 
nicely: 
https://github.com/apache/arrow/blob/99f7c3cf3e6c2a9555ceff3d48ef73e485ede546/r/R/dataset-factory.R#L85-L95
   
   What you said about where the code for this should go sounds right and 
thanks for volunteering to update the docs too!
   
   Thanks for sticking with this even though we've drastically changed what 
we're doing to resolve this - even though we're not using the code from your 
previous solutions, the process of testing them out and the surrounding 
discussion has been really helpful for identifying some serious enhancements 
that can be made to how the schema and column name components interact and also 
how we direct our users to work with `open_dataset()`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to