toppyy commented on pull request #12083: URL: https://github.com/apache/arrow/pull/12083#issuecomment-1012386125
I think that makes a lot sense! Trying to infer the column names from schema added (a surprising) amount of complexity to the code for little to none added value for the user. As you say, they can pass the arguments using a different approach. Just to make sure I got it right before I jump into making changes, the approach we're taking is this: Instead of deriving the column_names from the schema, we raise an error if `CsvReadOptions$create()` is used for read_options and it has no column names or they differ from the schema? If so, could the check for column names be inside `CsvFileFormat$create`? As the issue is limited to formats that use this function (not relevant for parquet). I can write this up in the documentation as a part of this PR. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
