johnsolomonj opened a new issue, #17686:
URL: https://github.com/apache/pinot/issues/17686
### Problem Statement
Upsert tables currently cannot use tiered storage, even when segments beyond
the upsert window are effectively immutable and could safely be moved to cold
storage.
```java
// Current validation in TableConfigUtils.java:779-780
Preconditions.checkState(tableConfig.getTierConfigsList() == null ||
tableConfig.getTierConfigsList().isEmpty(),
"Tiered storage is not supported for Upsert/Dedup tables");
```
This blanket prohibition prevents cost optimization for upsert tables with
time-bounded upsert windows (configured via `metadataTTL`).
### Proposed Solution
Allow tiered storage for upsert tables when `metadataTTL < minSegmentAge`,
ensuring segments only move to cold tier after their validDocIds bitmaps have
been frozen.
**Example Configuration:**
- `metadataTTL`: 3 days (upsert window)
- `minSegmentAge`: 7 days (tier boundary)
- Result: Segments 7+ days old can safely move to cold tier since they
haven't received upserts for 4 days
### Why This Is Safe
1. **Bitmaps freeze after TTL**: Once a segment passes `metadataTTL`, the
metadata manager stops tracking it and its `validDocIds` bitmap never receives
updates again (see `isOutOfMetadataTTL()` in
`BasePartitionUpsertMetadataManager`)
2. **No write conflicts**: Expired segments are removed from
`_trackedSegments` and `_primaryKeyToRecordLocationMap`, so no upsert
operations will attempt to modify their bitmaps
3. **Bitmap storage model supports this**: Bitmaps are persisted with
segments as `validdocids.bitmap.snapshot` files and move with the segment to
cold tier. Queries only need read access.
4. **Proven pattern**: Dedup tables already support this exact approach
(added in PR #17154, commit `ccc41ea8e5`):
```java
// From validateTTLAndTierConfigsForDedupTable()
Preconditions.checkState(ttlInMs < minSegmentAgeInMs,
"MetadataTTL: %s(ms) must be smaller than the minimum segmentAge:
%s(ms)",
ttlInMs, minSegmentAgeInMs);
```
### Implementation Approach
1. Update validation in `TableConfigUtils.validateUpsertAndDedupConfig()` to
allow tiered configs when TTL constraint is met
2. Add `validateTTLAndTierConfigsForUpsertTable()` method similar to
existing dedup validation
3. Update segment assignment policy to use multi-tier assignment when
constraint is satisfied
4. Require `metadataTTL > 0` when tiered storage is configured for upsert
tables
### Benefits
- **Storage cost reduction**: Move cold segments to cheaper storage tiers
- **No correctness impact**: Frozen segments guarantee query correctness
- **Consistent with dedup**: Uses same validation pattern already proven in
production
- **Backward compatible**: Existing upsert tables without tiered configs are
unaffected
### References
- Dedup multi-tier support: PR #17154 (commit `ccc41ea8e5`)
- Validation logic:
`pinot-segment-local/src/main/java/org/apache/pinot/segment/local/utils/TableConfigUtils.java`
- Bitmap freezing: `BasePartitionUpsertMetadataManager.isOutOfMetadataTTL()`
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]