Akanksha-kedia opened a new pull request, #18898:
URL: https://github.com/apache/pinot/pull/18898

   ## Description
   
   When a user changes the `fpp` (false positive probability) config for a 
bloom filter index, Pinot previously did NOT detect the change and would not 
rebuild the index. Users had to:
   1. Remove the bloom filter config
   2. Reload table
   3. Re-add bloom filter config with new fpp
   4. Reload table again
   
   This PR adds fpp change detection to `BloomFilterHandler`, following the 
same pattern used for H3 index resolution detection (PR #16953). The detection 
works by comparing the number of hash functions stored in the existing bloom 
filter with what the new fpp config would produce (given the column's 
cardinality). If they differ, the bloom filter is removed and recreated with 
the updated config.
   
   ### Changes Made
   - Modified `BloomFilterHandler.needUpdateIndices()` to check for fpp config 
changes on existing bloom filter columns
   - Modified `BloomFilterHandler.updateIndices()` to remove and rebuild bloom 
filters when fpp config has changed
   - Added `isFppChanged()` helper that reads `numHashFunctions` from the 
existing bloom filter data buffer
   - Added `computeExpectedNumHashFunctions()` that mirrors Guava's BloomFilter 
formula to compute the expected number of hash functions from fpp and 
cardinality
   
   ## Related Issue
   
   Fixes #17137
   
   ## Upgrade Notes
   
   None. This is a purely additive behavior change - bloom filter indexes will 
now be automatically rebuilt when fpp config changes, instead of silently 
keeping the old index.
   
   ## Testing Done
   
   - [x] Unit tests added: `SegmentPreProcessorTest#testBloomFilterFppUpdate` 
(tests both v1 and v3 segment formats)
   - [x] Test verifies: creating bloom filter with fpp=0.1, confirming no 
processing needed, changing fpp to 0.01, confirming processing IS needed, 
rebuilding, and confirming no further processing needed
   - [x] All existing bloom filter tests pass
   - [x] Checkstyle, spotless, and license checks pass


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to