LemonCL opened a new pull request, #60950:
URL: https://github.com/apache/doris/pull/60950

   ## Proposed changes
   
   This PR fixes a critical synchronization issue between 
`Tablet::_cumulative_point` (in-memory) and 
`TabletMeta::_cumulative_layer_point` (persistent in RocksDB), causing 
cumulative compaction to lose progress after BE restart.
   
   ## Problem description
   
   ### Root Cause
   
   There are two separate variables tracking the cumulative point:
   - `Tablet::_cumulative_point` - Runtime value, updated by compaction
   - `TabletMeta::_cumulative_layer_point` - Persistent value, stored in RocksDB
   
   **These two were never synchronized**, leading to:
   
   1. **After compaction**: `_cumulative_point` updated to 100, but 
`_cumulative_layer_point` remains -1
   2. **After BE restart**: Tablet loads `_cumulative_layer_point = -1` from 
RocksDB, but constructor hardcoded `_cumulative_point = -1`
   3. **Result**: Cumulative compaction restarts from scratch after every BE 
restart
   
   ### Impact
   
   - ❌ Cumulative compaction progress lost after BE restart
   - ❌ Clone/Restore scenarios lose cumulative point information
   - ❌ Inefficient compaction due to repeated work
   
   ## What changes were proposed in this pull request?
   
   Add bidirectional synchronization between the two cumulative point variables:
   
   ### 1. Load Path (BE Restart/Clone/Restore)
   
   **File**: `be/src/olap/tablet.cpp:260`
   
   ```cpp
   // Before: hardcoded to -1
   _cumulative_point(K_INVALID_CUMULATIVE_POINT),
   
   // After: load from TabletMeta (which was deserialized from RocksDB)
   _cumulative_point(_tablet_meta->cumulative_layer_point()),
   ```
   
   **Data Flow**:
   ```
   BE Startup
     → DataDir::load()
       → TabletMetaManager::traverse_headers() [RocksDB iteration]
         → TabletManager::load_tablet_from_meta(meta_binary)
           → TabletMeta::deserialize(meta_binary)
             → _cumulative_layer_point = pb.cumulative_layer_point()  [from 
RocksDB]
           → Tablet::Tablet(tablet_meta)
             → _cumulative_point = _tablet_meta->cumulative_layer_point()  
[FIXED]
   ```
   
   ### 2. Save Path (After Compaction)
   
   **File**: `be/src/olap/tablet.cpp:341`
   
   ```cpp
   void Tablet::save_meta() {
       check_table_size_correctness();
       // NEW: Sync in-memory value to TabletMeta before persisting
       _tablet_meta->set_cumulative_layer_point(_cumulative_point);
       auto res = _tablet_meta->save_meta(_data_dir);
       // ...
   }
   ```
   
   **Data Flow**:
   ```
   CumulativeCompaction::execute_compact()
     → update_cumulative_point()
       → Tablet::set_cumulative_layer_point(100)  [updates _cumulative_point]
     → Tablet::save_meta()
       → _tablet_meta->set_cumulative_layer_point(_cumulative_point)  [NEW: 
sync]
       → TabletMeta::save_meta()
         → serialize to RocksDB
   ```
   
   ## Types of changes
   
   What types of changes does your code introduce to Doris?
   
   - [x] Bugfix (non-breaking change which fixes an issue)
   - [ ] New feature (non-breaking change which adds functionality)
   - [ ] Breaking change (fix or feature that could cause existing 
functionality to not work as expected)
   - [ ] Documentation Update (if none of the other choices apply)
   - [ ] Code refactor (Modify the code structure, format or style, but without 
changing the code logic)
   - [ ] Optimization. Including functional usability improvements and 
performance improvements.
   - [ ] Dependency. Such as changes related to third-party components.
   
   ## Checklist
   
   - [x] I have read the [**CLA 
Document**](https://cla-assistant.io/apache/doris) and made sure all commits 
are signed-off.
   - [x] I have added necessary documentation (if appropriate)
   - [x] Any dependent changes have been merged
   - [ ] I have added test cases that prove my fix is effective or that my 
feature works
   
   ## Further comments
   
   ### Why is this safe?
   
   **For existing tablets (BE restart/clone/restore)**:
   - `TabletMeta::_cumulative_layer_point` is correctly loaded from RocksDB via 
`init_from_pb()`
   - Constructor now reads this value instead of hardcoding to -1
   - ✅ Restores the previously saved cumulative point
   
   **For new tablet creation**:
   - `TabletMeta` constructor sets `_cumulative_layer_point = -1` (line 171 in 
tablet_meta.cpp)
   - Constructor reads this value: `_cumulative_point = -1`
   - ✅ Still correctly initializes new tablets
   
   ### Related TODO Comment
   
   This fix also resolves an existing TODO:
   ```cpp
   // TODO(ygl): lost some information here, such as cumulative layer point
   // engine_storage_migration_task.cpp:348
   ```
   
   ### Test Plan
   
   Manual testing:
   1. Create tablet and run cumulative compaction
   2. Check `_cumulative_point` is updated (e.g., to 100)
   3. Restart BE
   4. Verify `_cumulative_point` is 100 (not -1)
   
   Existing tests should pass as this doesn't change behavior for correctly 
implemented code paths.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to