Fengzdadi opened a new issue, #86:
URL: https://github.com/apache/datasketches-go/issues/86

   ### Description
   
   I noticed a TODO comment at `hll/coupon_list.go:193`:
   
   ```go
   // TODO there must be a more efficient to reinterpret the byte array as an 
int array
   for it := 0; it < couponCount; it++ {
       list.couponIntArr[it] = 
int(binary.LittleEndian.Uint32(byteArray[listIntArrStart+it*4 : 
listIntArrStart+it*4+4]))
   }
   ```
   
   I'd like to discuss potential optimizations for this code path.
   
   ### Research
   
   I investigated how other DataSketches implementations handle this:
   
   | Language | Approach |
   |----------|----------|
   | **Java** | `Memory.getIntArray()` - bulk copy using `Unsafe.copyMemory()` |
   | **C++** | `std::memcpy()` - direct memory copy |
   | **Go (current)** | Loop with `binary.LittleEndian.Uint32()` |
   
   ### Benchmark Results
   
   I ran some benchmarks comparing different Go approaches:
   
   | Method | 8 items | 32 items | 128 items | 1024 items |
   |--------|---------|----------|-----------|------------|
   | **Current (loop)** | 27.6 ns | 75.4 ns | 270 ns | 2153 ns |
   | **binary.Read** | 101 ns | 237 ns | 692 ns | 5112 ns |
   | **Precompute offset** | 26.8 ns | 69.9 ns | 253 ns | 2000 ns |
   | **Loop unrolling** | 27.6 ns | 69.7 ns | 248 ns | 1998 ns |
   
   **Key findings:**
   1. `binary.Read` is actually **slower** due to reflection and extra 
allocations
   2. Pre-computing the offset provides ~7% improvement
   3. The current implementation is already quite efficient
   
   ### Proposed Changes
   
   **Option A: Minor optimization (pre-compute offset)**
   
   ```go
   base := listIntArrStart
   for it := 0; it < couponCount; it++ {
       list.couponIntArr[it] = int(binary.LittleEndian.Uint32(byteArray[base:]))
       base += 4
   }
   ```
   - ~7% performance improvement
   - Cleaner code
   - Remove or update TODO comment
   
   **Option B: Use `unsafe` package**
   
   ```go
   src := byteArray[listIntArrStart : listIntArrStart+couponCount*4]
   int32Slice := unsafe.Slice((*int32)(unsafe.Pointer(&src[0])), couponCount)
   ```
   - Near-zero overhead (similar to C++ memcpy)
   - Introduces `unsafe` dependency
   - May have portability concerns
   
   **Option C: Keep current implementation**
   
   The current implementation is already quite efficient. Perhaps just update 
the TODO to:
   
   ```go
   // Note: Alternative approaches (binary.Read, unsafe) were benchmarked but 
did not
   // provide significant improvements. The current loop-based approach is 
optimal for Go.
   ```
   
   ### Questions for Maintainers
   
   1. Would you accept Option A (minor optimization) as a PR?
   2. Is the project open to using `unsafe` for performance-critical paths?
   3. Should the TODO be updated/removed to reflect that the current approach 
is adequate?
   
   ### Environment
   
   - Go version: 1.25.5
   - OS: Windows
   - CPU: Intel Core i7-10700K
   
   ---
   
   I'm happy to submit a PR for any of these options based on your feedback!
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to