jorisvandenbossche opened a new pull request #8912: URL: https://github.com/apache/arrow/pull/8912
The C++ `FileSystemDatasetFactory::Finish` method handles the schema inference or validation with two options: `InspectOptions::fragments` to indicate the number of fragments to use when inferring *or* validating the schema (default of 1), and the `FinishOptions::validate_fragments` to indicate whether to validate the specified schema (when not inferred). For now, I decided to combine this in a single keyword on the Python side (`validate_schema`). This avoids adding 2 inter-dependent keywords for this, and makes it easier to express some typical use cases (eg validate the specified schema with all fragments is now `validate_schema=True` instead of `validate_schema=True, fragments=-1`). On the other hand, it gives a single keyword that accepts both boolean or int (which is not super clean). So this is certainly up for discussion. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
