proost opened a new issue, #56: URL: https://github.com/apache/datasketches-go/issues/56
# Summary Adding Compact Sketch and Compact Methods to Update Sketch. # Design This phase I will only handle `CompactSketch` . Wrapped compact sketch will be handled in phase 4. Because In this phase, I will add very large code. So handling wrapped compact sketch together makes difficult for me and reviewers to code and code review. ## Compact Sketch Below are methods and signatures of `CompactSketch` . Most methods are from `Sketch` interface. ```go // CompactSketch is an immutable form of the Theta sketch, the form that can be serialized and deserialized type CompactSketch struct // IsEmpty returns true if this sketch represents an empty set func (s *CompactSketch) IsEmpty() bool // IsOrdered returns true if retained entries are ordered func (s *CompactSketch) IsOrdered() bool // Theta64 returns theta as a positive integer func (s *CompactSketch) Theta64() uint64 // NumRetained returns the number of retained entries func (s *CompactSketch) NumRetained() uint32 // SeedHash returns hash of the seed func (s *CompactSketch) SeedHash() (uint16, error) // Estimate returns the estimate of distinct count func (s *CompactSketch) Estimate() float64 // LowerBound returns the approximate lower error bound func (s *CompactSketch) LowerBound(numStdDevs uint8) (float64, error) // UpperBound returns the approximate upper error bound func (s *CompactSketch) UpperBound(numStdDevs uint8) (float64, error) // IsEstimationMode returns true if the sketch is in estimation mode func (s *CompactSketch) IsEstimationMode() bool // Theta returns theta as a fraction from 0 to 1 func (s *CompactSketch) Theta() float64 // String provides a human-readable summary func (s *CompactSketch) String(shouldPrintItems bool) string // All returns hash values in this sketch func (s *CompactSketch) All() iter.Seq[uint64] // MaxSerializedSizeBytes computes maximum serialized size in bytes // lgK is the nominal number of entries in the sketch func (s *CompactSketch) MaxSerializedSizeBytes(lgK uint8) uint8 // SerializedSizeBytes computes the size in bytes required to serialize the current state of the sketch. // Computing compressed size is expensive. It takes iterating over all retained hashes, // and the actual serialization will have to look at them again. // compressed if true compressed size is returned (if applicable) func (s *CompactSketch) SerializedSizeBytes(compressed bool) int // MarshalBinary implements encoding.BinaryMarshaler (uncompressed) func (s *CompactSketch) MarshalBinary() ([]byte, error) ``` `CompactSketch` implements [BinaryMarshaler](https://pkg.go.dev/[email protected]#BinaryMarshaler) . But `CompactSketch` can’t implement `UnmarshalBinary`. Because we need `seed` explicitly to deserialization. So I follow Encoder / Decoder patterns like [encoding/gob](https://pkg.go.dev/encoding/[email protected]) , [encoding/json](https://pkg.go.dev/encoding/[email protected]#Encoder.Encode) , [encoding/xml](https://pkg.go.dev/encoding/[email protected]) packages did. But a difference from those package is that using `Decoder` and `Encoder` as value. Methods of `Encoder` , `Decoder` not change states itself. By using value, we can avoid heap allocation. ```go // Decoder decodes a compact sketch from the given reader. type Decoder struct { seed uint64 } // NewDecoder creates a new decoder. func NewDecoder(seed uint64) Decoder { return Decoder{ seed: seed, } } // Decode decodes a compact sketch from the given reader. func (dec Decoder) Decode(r io.Reader) (*CompactSketch, error) // Encoder encodes a compact theta sketch to bytes. type Encoder struct { w io.Writer compressed bool } // NewEncoder creates a new encoder. func NewEncoder(w io.Writer, compressed bool) Encoder { return Encoder{w: w, compressed: compressed} } // Encode encodes a compact theta sketch to bytes. func (enc Encoder) Encode(sketch *CompactSketch) error ``` But for convenience, I will add `MarshalBinary` for serialization and `Decode` method which is static factory method for `CompactSketch` ```go // Decode decodes a compact sketch from the given bytes. func Decode(bytes []byte, seed uint64) (*CompactSketch, error) ``` ## Update Sketch I will add two methods which missed in the phase 2. ```go func (s *QuickSelectUpdateSketch) Compact(ordered bool) *CompactSketch func (s *QuickSelectUpdateSketch) CompactOrdered() *CompactSketch ``` # Implementation Schedule I will upload 3 PRs. 1. PR about bit packing utilities, count leading zeroes utilities. Those utilities is used to compact sketch. 2. PR about compact sketch. In this PR, I will handle all things about compact sketch. so in this PR, I will upload compatibility test between C++,Java and Go. 3. PR about update sketch. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
