[GitHub] [arrow] jorisvandenbossche commented on pull request #11014: ARROW-13775: [C++] Allow Partitioning objects to be created with a vector of field names

GitBox Mon, 30 Aug 2021 00:20:49 -0700


jorisvandenbossche commented on pull request #11014:
URL: https://github.com/apache/arrow/pull/11014#issuecomment-908099090



   Can you explain a bit what's the goal for allowing a Partitioning object 
without a schema? (we already have a PartitioningFactory for the case where the 
schema gets inferred from parsed paths)
   
   What would be a use case? (I don't see any test where it is actually used 
for something) I suppose as alternative for 
https://github.com/apache/arrow/pull/11008, right?
   
   
   > I also removed the corresponding accessor from python. I don't think this 
is a problem though because `GetSchema` would only work if there was a schema 
which means...
   > 
   > 1. You created the partitioning with a schema in the first place so there 
is no need to pull it back out.
   > 2. You created the schema with a FileSystemDatasetFactory so you can just 
get the field types from the dataset schema.
   
   I added the partitioning attribute on a Dataset on purpose so you can easily 
check the schema of your partitioning after reading a partitioned dataset (eg 
dask needs this). It's true that you also can get the types of the partition 
columns from the dataset's schema if you still have access to the field names 
of the Partitioning (but this doesn't yet seem to be exposed in this PR, though 
it should be easy to add), but that's certainly less convenient IMO. 
   (We could also still leave the attribute, but have it error if not available)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] jorisvandenbossche commented on pull request #11014: ARROW-13775: [C++] Allow Partitioning objects to be created with a vector of field names

Reply via email to