westonpace commented on code in PR #14085:
URL: https://github.com/apache/arrow/pull/14085#discussion_r967273714
##########
cpp/src/arrow/dataset/file_base.cc:
##########
@@ -93,8 +93,11 @@ bool FileSource::Equals(const FileSource& other) const {
bool match_file_system =
(filesystem_ == nullptr && other.filesystem_ == nullptr) ||
(filesystem_ && other.filesystem_ &&
filesystem_->Equals(other.filesystem_));
- return match_file_system && file_info_.Equals(other.file_info_) &&
- buffer_->Equals(*other.buffer_) && compression_ == other.compression_;
+ bool match_buffer = (buffer_ == nullptr && other.buffer_ == nullptr) ||
+ ((buffer_ != nullptr && other.buffer_ != nullptr) &&
+ (buffer_->address() == other.buffer_->address()));
Review Comment:
I'm pretty sure the idea, originally, was to be able to compare datasets.
This led to the need to compare formats, fragments, and, eventually, file
source. I'm not sure if it's used practically anywhere other than unit tests.
Identity seems safest. Two files can contain the same contents but we
wouldn't consider those file fragments equal.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]