proost opened a new issue, #52:
URL: https://github.com/apache/datasketches-go/issues/52

   # Summary
   
   This time I will add update sketch 
   
   # Design
   
   ## `QuickSelectUpdateSketch`
   
   There are 3 different versions of update sketches in Java, Quick Select, 
Alpha, Concurrent Quick Select. I will implement Quick Select first. If needed, 
other sketches implement later.
   
   Below is methods and class.
   
   ```go
   var (
        ErrUpdateEmptyString = errors.New("cannot update empty string")
        ErrDuplicateKey      = errors.New("duplicate key")
   )
   
   // QuickSelectUpdateSketch is an Update Theta sketch based on the 
QuickSelect algorithm.
   // The purpose of this class is to build a Theta sketch from input data via 
the update() methods.
   type QuickSelectUpdateSketch struct {
        table *Hashtable
   }
   
   // IsEmpty returns true if this sketch represents an empty set
   func (s *QuickSelectUpdateSketch) IsEmpty() bool
   
   // IsOrdered returns true if retained entries are ordered
   func (s *QuickSelectUpdateSketch) IsOrdered() bool
   
   // Theta64 returns theta as a positive integer between 0 and math.MaxInt64
   func (s *QuickSelectUpdateSketch) Theta64() uint64
   
   // NumRetained returns the number of retained entries in the sketch
   func (s *QuickSelectUpdateSketch) NumRetained() uint32
   
   // SeedHash returns hash of the seed that was used to hash the input
   func (s *QuickSelectUpdateSketch) SeedHash() (uint16, error)
   
   // Estimate returns estimate of the distinct count of the input stream
   func (s *QuickSelectUpdateSketch) Estimate() float64
   
   // LowerBound returns the approximate lower error bound given a number of 
standard deviations.
   // This parameter is similar to the number of standard deviations of the 
normal distribution
   // and corresponds to approximately 67%, 95% and 99% confidence intervals.
   // numStdDevs number of Standard Deviations (1, 2 or 3)
   func (s *QuickSelectUpdateSketch) LowerBound(numStdDevs uint8) (float64, 
error)
   
   // UpperBound returns the approximate upper error bound given a number of 
standard deviations.
   // This parameter is similar to the number of standard deviations of the 
normal distribution
   // and corresponds to approximately 67%, 95% and 99% confidence intervals.
   // numStdDevs number of Standard Deviations (1, 2 or 3)
   func (s *QuickSelectUpdateSketch) UpperBound(numStdDevs uint8) (float64, 
error)
   
   // IsEstimationMode returns true if the sketch is in estimation mode
   // (as opposed to exact mode)
   func (s *QuickSelectUpdateSketch) IsEstimationMode() bool
   
   // Theta returns theta as a fraction from 0 to 1 (effective sampling rate)
   func (s *QuickSelectUpdateSketch) Theta() float64
   
   // String returns a human-readable summary of this sketch as a string
   // If shouldPrintItems is true, include the list of items retained by the 
sketch
   func (s *QuickSelectUpdateSketch) String(shouldPrintItems bool) string
   
   // LgK returns configured nominal number of entries in the sketch
   func (s *QuickSelectUpdateSketch) LgK() uint8
   
   // ResizeFactor returns a configured resize factor of the sketch
   func (s *QuickSelectUpdateSketch) ResizeFactor() ResizeFactor
   
   // UpdateUint64 updates this sketch with a given unsigned 64-bit integer
   // Only update when the value is not existing, if not returns ErrDuplicateKey
   func (s *QuickSelectUpdateSketch) UpdateUint64(value uint64) error
   
   // UpdateInt64 updates this sketch with a given signed 64-bit integer
   // Only update when the value is not existing, if not returns ErrDuplicateKey
   func (s *QuickSelectUpdateSketch) UpdateInt64(value int64) error
   
   // UpdateUint32 updates this sketch with a given unsigned 32-bit integer
   // Only update when the value is not existing, if not returns ErrDuplicateKey
   func (s *QuickSelectUpdateSketch) UpdateUint32(value uint32) error
   
   // UpdateInt32 updates this sketch with a given signed 32-bit integer
   // Only update when the value is not existing, if not returns ErrDuplicateKey
   func (s *QuickSelectUpdateSketch) UpdateInt32(value int32) error
   
   // UpdateUint16 updates this sketch with a given unsigned 16-bit integer
   // Only update when the value is not existing, if not returns ErrDuplicateKey
   func (s *QuickSelectUpdateSketch) UpdateUint16(value uint16) error
   
   // UpdateInt16 updates this sketch with a given signed 16-bit integer
   // Only update when the value is not existing, if not returns ErrDuplicateKey
   func (s *QuickSelectUpdateSketch) UpdateInt16(value int16) error
   
   // UpdateUint8 updates this sketch with a given unsigned 8-bit integer
   // Only update when the value is not existing, if not returns ErrDuplicateKey
   func (s *QuickSelectUpdateSketch) UpdateUint8(value uint8) error
   
   // UpdateInt8 updates this sketch with a given signed 8-bit integer
   // Only update when the value is not existing, if not returns ErrDuplicateKey
   func (s *QuickSelectUpdateSketch) UpdateInt8(value int8) error
   
   // UpdateFloat64 updates this sketch with a given double-precision floating 
point value
   // Only update when the value is not existing, if not returns ErrDuplicateKey
   func (s *QuickSelectUpdateSketch) UpdateFloat64(value float64) error
   
   // UpdateFloat32 updates this sketch with a given floating point value
   // Only update when the value is not existing, if not returns ErrDuplicateKey
   func (s *QuickSelectUpdateSketch) UpdateFloat32(value float32) error
   
   // UpdateString updates this sketch with a given string
   // Only update when the value is not existing, if not returns ErrDuplicateKey
   // Returns ErrUpdateEmptyString when value is empty
   func (s *QuickSelectUpdateSketch) UpdateString(value string) error
   
   // UpdateBytes updates this sketch with given data
   // Only update when the value is not existing, if not returns ErrDuplicateKey
   func (s *QuickSelectUpdateSketch) UpdateBytes(data []byte) error
   
   // Trim removes retained entries in excess of the nominal size k (if any)
   func (s *QuickSelectUpdateSketch) Trim()
   
   // Reset resets the sketch to the initial empty state
   func (s *QuickSelectUpdateSketch) Reset()
   
   // All returns an iterator over hash values in this sketch
   func (s *QuickSelectUpdateSketch) All() iter.Seq[uint64]
   ```
   I will stick to go idiom naming convention. 
   
   Here is what I considered.
   
   1. I’ve considered adding `UpdateSketch` interface. If I `UpdateSketch` , 
`UpdateSketch` is big interface. I'm a little unsure maybe later update sketch 
can implement all the methods. So I just adding implementation.
   2. I’ve considered multiple `UpdateX` methods to one method using generic.  
Generics don't help here since each type needs different handling. Also, no 
need to type check and branch is more clean.
   3. In C++ implementation, `Update` method raise exception or silently 
ignored. but I returned error explicitly so user can handle it.
   4. explicitly returns `ErrUpdateEmptyString` when `value` is empty string to 
`UpdateString` . In C++ implementation, skipping it silently. but I prefer 
explicit one.
   
   What I missed:
   
   1. `Compact` method. I plan to add `Compact` method to phase 3. Because In 
Phase 3, I will add compact sketch. So In this phase, I will skip it 
intentionally.
   
   # Implementation Schedule
   
   I will all code using a 2 PRs. 
   
   One is internal package `binomialbounds` which is ported from 
https://github.com/apache/datasketches-cpp/blob/master/common/include/binomial_bounds.hpp.
 This package is needed for calculating lower bound and upper bound.
   
   Second is `UpdateSketch` which I said above.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to