westonpace commented on code in PR #13567:
URL: https://github.com/apache/arrow/pull/13567#discussion_r917974623
##########
cpp/src/arrow/dataset/partition.h:
##########
@@ -63,13 +64,18 @@ struct ARROW_DS_EXPORT PartitionPathFormat {
/// Paths are consumed from left to right. Paths must be relative to
/// the root of a partition; path prefixes must be removed before passing
/// the path to a partitioning for parsing.
-class ARROW_DS_EXPORT Partitioning {
+class ARROW_DS_EXPORT Partitioning : public
util::EqualityComparable<Partitioning> {
public:
virtual ~Partitioning() = default;
/// \brief The name identifying the kind of partitioning
virtual std::string type_name() const = 0;
+ /// \brief determines if partiioning is exactly equal
+ virtual bool Equals(const Partitioning& other, bool check_metadata = false)
const {
Review Comment:
We shouldn't need a `check_metadata` flag here. This method here should
pass `check_metadata=false` in always.
If, by some chance, a future partitioning decides to depend on the schema
metadata (which should be possible) then it should override this method and
pass `check_metadata=true`.
##########
cpp/src/arrow/dataset/partition.cc:
##########
@@ -82,6 +82,14 @@ std::shared_ptr<Partitioning> Partitioning::Default() {
std::string type_name() const override { return "default"; }
+ bool Equals(const Partitioning& other, bool check_metadata) const override
{
+ if (this == &other) {
+ return true;
+ }
+ return checked_cast<const DefaultPartitioning&>(other).type_name() ==
type_name() &&
Review Comment:
I agree. If you received a pointer type `Partitioning*` you could do a
`dynamic_cast` instead of the `type_name` comparison. However, a failed
`dynamic_cast` is an exception when you have a reference type and it's probably
better to just compare `type_name`.
The comparison to `type_name` is basically doing the same thing as a cast
check anyways.
The reason a `checked_cast` is a bad idea is that it should be ok for a user
to do:
```
DefaultPartitioning one;
KeyValuePartitioning two(schema);
if (one.Equals(two)) {
std::cout << "equal" << std::endl;
}
```
This `Equals` method should always return false so this toy example is
somewhat meaningless. However, it could be useful if the actual partitions
were based on file metadata or something dynamically calculated. If there is a
`checked_cast` then, instead of false, we will get an abort in debug mode and
potentially a segmentation fault in release mode.
##########
cpp/src/arrow/dataset/partition.h:
##########
@@ -310,6 +322,11 @@ class ARROW_DS_EXPORT FunctionPartitioning : public
Partitioning {
std::string type_name() const override { return name_; }
+ bool Equals(const Partitioning& other, bool check_metadata) const override {
+ return ::arrow::internal::checked_cast<const FunctionPartitioning&>(other)
+ .type_name() == type_name();
Review Comment:
I was going to suggest we at least compare `parse_impl_` and `format_impl_`
but it appears [that is somewhat
meaningless](https://en.cppreference.com/w/cpp/utility/functional/function/operator_cmp).
I think we should always return false here and, if a user wants to allow
comparisons, they can override this.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]