gene-bordegaray commented on issue #21992:
URL: https://github.com/apache/datafusion/issues/21992#issuecomment-4378732353
I am in favor of a general trait as the long-term goal of this work. I think
allowing users to implement their own type of partitioning will make DF more
powerful in production use cases as I am sure that there will be instances of
partitioning that are not captured in `Hash` or `Range` partitioning (just as
`Hash` did not fully work for us). Off the top of my head something like a
`Value` partitioning would also be useful:
```text
p0: col in ('a', 'd')
p1: col in ('b')
p2: col in ('c')
```
Because of this I think providing another extendible point for people (the
trait) will be very high value even if worth the extra effort.
With this said I do think we can create mergeable commits by extending the
enum now by supporting `Range` partitioning as @adriangb has described but
model it after what our trait will look like. We can treat the trait as the
final goal but let `Range` help us define the requirements for that as we
pseudo-implement what that trait will look like.
> If we are going to go with a trait, it might be good to declare that we
eventually want to shoot to remove all special cases for
> hash partitioning 🤔
And along with this, yes I agree here that we should shoot to encapsulate
all partitioning logic behind this trait and no special cases. The optimizer
rules and other things should ask if two partitioning are compatible or satisfy
one another, not just "is this Hash partitioned" 👍
I see this path as actually being more intuitive once done well.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]