ZENOTME commented on PR #111:
URL:
https://github.com/apache/datasketches-rust/pull/111#issuecomment-4229316731
> That said, we still have sketches like BloomFilter and FrequentItemsSketch
that depend on Rust's std::hash::Hash. If we partially "fix" for CPC/HLL/Theta,
then it creates a new splitting point among sketches.
Yes, we should ensure that BloomFilter and FrequentItemsSketch are also
consistent with the other implementations. However, I noticed that even in the
C++ version, BloomFilter and Theta follow different canonicalization paths:
```
// bloom_filter
template<typename A>
void bloom_filter_alloc<A>::update(uint32_t item) {
update(static_cast<uint64_t>(item));
}
// theta
template<typename A>
void update_theta_sketch_alloc<A>::update(uint32_t value) {
update(static_cast<int32_t>(value));
}
template<typename A>
void update_theta_sketch_alloc<A>::update(int32_t value) {
update(static_cast<int64_t>(value));
}
```
Instead of introducing multiple hashing behaviors using multiple type, I
think it maybe more reasonable to define a single canonical hashing model and
make all sketches follow it.
> To replace std::hash::Hash with datasketches' own Hash trait is one
possible way to go, but then we need to consider whether this should be sealed
or users can implement for their own types
For now, I made `SketchHashable` a sealed trait to ensure behavior
consistent with the Java and C++ implementations. Users can only call `update`
with the supported types.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]