nateab opened a new pull request, #27521:
URL: https://github.com/apache/flink/pull/27521

   ## What is the purpose of the change
   
   Remove a misleading 5-year-old TODO comment in 
`StreamMultipleInputProcessor.fullCheckAndSetAvailable()` that suggested 
batching availability checks per NetworkBuffer.
   
   The TODO (added by Piotr Nowojski in Feb 2020) claimed that `isAvailable()` 
volatile reads are expensive when called per-record. Benchmarking proves this 
optimization is unnecessary.
   
   ## Benchmark Results
   
   **Isolated benchmark (5M iterations, 5 trials):**
   | Pattern | Time | Notes |
   |---------|------|-------|
   | Volatile read (isDone) | ~0.5-0.8 ns | The "expensive" operation |
   | Current pattern | 2.1-2.4 ns | isApproximatelyAvailable() || isAvailable() 
|
   | Batched pattern | 1.6-1.8 ns | Check every 100 records |
   | Apparent savings | 21-28% | Looks good in isolation... |
   
   **Realistic benchmark (with simulated record processing):**
   | Processing Level | Check Overhead | % of Total Time | Batching Savings |
   |-----------------|----------------|-----------------|------------------|
   | Light (~100ns) | 0.1 ns | 0.06% | -0.2 ns (worse!) |
   | Typical (~500ns) | 2.2 ns | 0.49% | 0.7 ns (0.15%) |
   | Heavy (~1000ns) | 4.5 ns | 0.49% | -0.9 ns (worse!) |
   
   ## Key Findings
   
   1. **Availability check overhead is <1% of record processing time** in all 
scenarios
   2. **Batching actually performs worse** in some cases due to cache/modulo 
overhead
   3. The code already uses `isApproximatelyAvailable()` as a fast path 
(reference comparison)
   4. The proposed optimization would add complexity for negligible benefit
   
   ## Brief change log
   
   - Replace misleading TODO with accurate comment explaining the fast-path 
optimization
   
   ## Does this pull request potentially affect one of the following parts
   
   - Dependencies: no
   - Public API: no
   - Serializers: no
   - Runtime/Coordination: yes (comment only, no behavior change)
   - SQL/Table API: no
   - Connectors: no
   - Checkpointing: no
   
   ## Documentation
   
   - Does this pull request introduce a new feature? no
   - If yes, how is the feature documented? N/A


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to