johnsolomonj opened a new issue, #17686:
URL: https://github.com/apache/pinot/issues/17686

   ### Problem Statement
   
   Upsert tables currently cannot use tiered storage, even when segments beyond 
the upsert window are effectively immutable and could safely be moved to cold 
storage.
   
   ```java
   // Current validation in TableConfigUtils.java:779-780
   Preconditions.checkState(tableConfig.getTierConfigsList() == null || 
tableConfig.getTierConfigsList().isEmpty(),
       "Tiered storage is not supported for Upsert/Dedup tables");
   ```
   
   This blanket prohibition prevents cost optimization for upsert tables with 
time-bounded upsert windows (configured via `metadataTTL`).
   
   ### Proposed Solution
   
   Allow tiered storage for upsert tables when `metadataTTL < minSegmentAge`, 
ensuring segments only move to cold tier after their validDocIds bitmaps have 
been frozen.
   
   **Example Configuration:**
   - `metadataTTL`: 3 days (upsert window)
   - `minSegmentAge`: 7 days (tier boundary)
   - Result: Segments 7+ days old can safely move to cold tier since they 
haven't received upserts for 4 days
   
   ### Why This Is Safe
   
   1. **Bitmaps freeze after TTL**: Once a segment passes `metadataTTL`, the 
metadata manager stops tracking it and its `validDocIds` bitmap never receives 
updates again (see `isOutOfMetadataTTL()` in 
`BasePartitionUpsertMetadataManager`)
   
   2. **No write conflicts**: Expired segments are removed from 
`_trackedSegments` and `_primaryKeyToRecordLocationMap`, so no upsert 
operations will attempt to modify their bitmaps
   
   3. **Bitmap storage model supports this**: Bitmaps are persisted with 
segments as `validdocids.bitmap.snapshot` files and move with the segment to 
cold tier. Queries only need read access.
   
   4. **Proven pattern**: Dedup tables already support this exact approach 
(added in PR #17154, commit `ccc41ea8e5`):
      ```java
      // From validateTTLAndTierConfigsForDedupTable()
      Preconditions.checkState(ttlInMs < minSegmentAgeInMs,
          "MetadataTTL: %s(ms) must be smaller than the minimum segmentAge: 
%s(ms)",
          ttlInMs, minSegmentAgeInMs);
      ```
   
   ### Implementation Approach
   
   1. Update validation in `TableConfigUtils.validateUpsertAndDedupConfig()` to 
allow tiered configs when TTL constraint is met
   2. Add `validateTTLAndTierConfigsForUpsertTable()` method similar to 
existing dedup validation
   3. Update segment assignment policy to use multi-tier assignment when 
constraint is satisfied
   4. Require `metadataTTL > 0` when tiered storage is configured for upsert 
tables
   
   ### Benefits
   
   - **Storage cost reduction**: Move cold segments to cheaper storage tiers
   - **No correctness impact**: Frozen segments guarantee query correctness
   - **Consistent with dedup**: Uses same validation pattern already proven in 
production
   - **Backward compatible**: Existing upsert tables without tiered configs are 
unaffected
   
   ### References
   
   - Dedup multi-tier support: PR #17154 (commit `ccc41ea8e5`)
   - Validation logic: 
`pinot-segment-local/src/main/java/org/apache/pinot/segment/local/utils/TableConfigUtils.java`
   - Bitmap freezing: `BasePartitionUpsertMetadataManager.isOutOfMetadataTTL()`
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to