westonpace commented on a change in pull request #11008: URL: https://github.com/apache/arrow/pull/11008#discussion_r710303854
########## File path: python/pyarrow/dataset.py ########## @@ -678,17 +678,32 @@ def dataset(source, schema=None, format=None, filesystem=None, ) -def _ensure_write_partitioning(scheme): - if scheme is None: - scheme = partitioning(pa.schema([])) - if not isinstance(scheme, Partitioning): - # TODO support passing field names, and get types from schema - raise ValueError("partitioning needs to be actual Partitioning object") - return scheme +def _ensure_write_partitioning(part, schema, flavor): + if isinstance(part, Partitioning) and flavor: + raise ValueError( + "Providing a partitioning_flavor with " + "a Partitioning object is not supported" + ) + elif isinstance(part, (tuple, list)): + # Name of fields were provided instead of a partitioning object. + # Create a partitioning factory with those field names. + part = partitioning( + schema=pa.schema([schema.field(f) for f in part]), + flavor=flavor + ) + elif part is None: + part = partitioning(pa.schema([]), flavor=flavor) + + if not isinstance(part, Partitioning): + raise ValueError( + "partitioning must be a Partitioning object with a schema" Review comment: If you have a `Partitioning` object then it was created with a schema (there is no other way to get one in python that I can see). I think this is maybe to catch the case where the user did something like `ds.partitioning(['a'])` and thus got a partitioning factory? Also, the error is a little misleading because `partitioning` can be a list[str]. Maybe specifically test for `PartitioningFactory`? ``` if isinstance(part, PartitioningFactory): raise ValueError("A PartitioningFactory cannot be used. Did you call the partitioning method without supplying a schema?") elif not isinstance(part, Partitioning): raise ValueError("partitioning must be a Partitioning object or a list of column names") ``` @jorisvandenbossche thoughts? If you're fine with the error as is then I won't push. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org