jorisvandenbossche commented on pull request #11014: URL: https://github.com/apache/arrow/pull/11014#issuecomment-908099090
Can you explain a bit what's the goal for allowing a Partitioning object without a schema? (we already have a PartitioningFactory for the case where the schema gets inferred from parsed paths) What would be a use case? (I don't see any test where it is actually used for something) I suppose as alternative for https://github.com/apache/arrow/pull/11008, right? > I also removed the corresponding accessor from python. I don't think this is a problem though because `GetSchema` would only work if there was a schema which means... > > 1. You created the partitioning with a schema in the first place so there is no need to pull it back out. > 2. You created the schema with a FileSystemDatasetFactory so you can just get the field types from the dataset schema. I added the partitioning attribute on a Dataset on purpose so you can easily check the schema of your partitioning after reading a partitioned dataset (eg dask needs this). It's true that you also can get the types of the partition columns from the dataset's schema if you still have access to the field names of the Partitioning (but this doesn't yet seem to be exposed in this PR, though it should be easy to add), but that's certainly less convenient IMO. (We could also still leave the attribute, but have it error if not available) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
