sentomk opened a new issue, #690:
URL: https://github.com/apache/iceberg-cpp/issues/690

   While reading the scan-planning filtering path, I found a small optimization 
in the metrics evaluators. Using it as a concrete example to raise a broader 
question about how to validate this kind of change.
   
   ## Proposed change
   
   The metrics evaluators run per data file. Each predicate currently calls 
`expr->reference()` repeatedly, and `reference()` returns a `shared_ptr` via 
`shared_from_this()` — an atomic refcount bump every time. The 
`StrictMetricsEvaluator` macro even discards a `dynamic_cast` result only to 
re-fetch the same reference:
   
   ```diff
   - #define RETURN_IF_NOT_REFERENCE(expr)                                      
   \
   -   if (auto ref = dynamic_cast<BoundReference*>(expr.get()); ref == 
nullptr) { \
   -     return kRowsMightNotMatch;                                             
   \
   -   }
   + #define BIND_REFERENCE_OR_RETURN(ref, expr)                            \
   +   const auto* ref = dynamic_cast<const BoundReference*>((expr).get()); \
   +   if (ref == nullptr) {                                                \
   +     return kRowsMightNotMatch;                                         \
   +   }
   
       Result<bool> IsNull(const std::shared_ptr<Bound>& expr) override {
   -     RETURN_IF_NOT_REFERENCE(expr);
   -     int32_t id = expr->reference()->field().field_id();
   +     BIND_REFERENCE_OR_RETURN(ref, expr);
   +     int32_t id = ref->field().field_id();
         ...
   ```
   
   Reusing the cast result drops the repeated virtual `reference()` calls (and 
their atomic ops) across every predicate, with no behavior change.
   
   ## Expected benefit
   
   The win is on the CPU-bound filtering step, evaluated in isolation. Scan 
planning as a whole is IO-bound, so on an e2e scan this kind of change is 
almost certainly unmeasurable — which is exactly why it needs to be measured on 
the filtering step alone.
   
   ## Which raises the question: do we need a benchmark suite?
   
   This is exactly the kind of change that's hard to justify without one. The 
repo has no benchmark infrastructure today, only the gtest suite. A minimal 
benchmark on the filtering path would let us measure such changes on the 
CPU-bound step alone, rather than guessing or claiming a win against 
IO-dominated planning.
   
   So before going further:
   
   1. **How** should we add it — Google Benchmark fetched the same way 
googletest already is, behind an off-by-default CMake option?
   2. **Where** should it live — a top-level `benchmark/`, or co-located under 
`src/iceberg/**/`?
   
   I'm happy to put up a draft PR for a minimal suite + the filtering benchmark 
above once there's agreement on direction.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to