peterylh opened a new pull request, #57575:
URL: https://github.com/apache/doris/pull/57575

   # Profile Archive Feature
   
   ## Summary
   
   Add automatic profile archiving feature to preserve query profiles beyond 
current memory and disk limits. When profile storage reaches capacity, old 
profiles are automatically archived to compressed ZIP files instead of being 
deleted, enabling long-term profile retention for troubleshooting and analysis.
   
   ## Problem
   
   Currently, Doris has strict limits on profile storage:
   - **Memory profiles**: Maximum 500 profiles (`max_query_profile_num`)
   - **Spilled profiles**: Maximum 500 profiles (`max_spilled_profile_num`)
   - **Storage size**: Maximum 1GB (`spilled_profile_storage_limit_bytes`)
   
   When these limits are exceeded, old profiles are **permanently deleted**, 
making it impossible to analyze historical slow queries beyond the retention 
window. This is problematic for:
   - Post-incident analysis of production issues
   - Long-term performance trend analysis
   - Debugging intermittent query problems
   
   ## Solution
   
   Implement an automatic profile archiving system that:
   1. **Moves** outdated profiles to an archive directory instead of deleting 
them
   2. **Batches** profiles into compressed ZIP files for efficient storage
   3. **Retains** profiles for a configurable period (default 7 days)
   4. **Provides** predictable file naming for easy profile location
   
   ### Key Features
   
   - **Pending Buffer Strategy**: Profiles are staged in `archive/pending/` 
before archiving to ensure optimal batch sizes
   - **Dual Trigger Mechanism**:
     - Archive when batch size reaches configured limit (default 100 profiles)
     - Archive when oldest pending file exceeds timeout (default 24 hours)
   - **Automatic Cleanup**: Remove archives older than retention period 
(configurable)
   - **Graceful Degradation**: Falls back to direct deletion if archiving fails
   
   ## Implementation Details
   
   ### Directory Structure
   
   ```
   ${LOG_DIR}/profile/
   ├── {timestamp}_{queryid}.zip          # Active spilled profiles
   └── archive/                           # Archive root
       ├── pending/                       # Staging area for batching
       │   └── {timestamp}_{queryid}.zip
       └── profiles_20240101_000000_20240101_235959.zip  # Archived batches
   ```
   
   ### Archive File Naming
   
   Archive ZIPs follow the naming pattern: 
`profiles_{start_timestamp}_{end_timestamp}.zip`
   - `start_timestamp`: Earliest profile in the batch (YYYYMMDD_HHMMSS)
   - `end_timestamp`: Latest profile in the batch (YYYYMMDD_HHMMSS)
   
   This enables quick location of profiles by query time.
   
   ### Workflow
   
   ```
   Query Profile Creation
       ↓
   Memory Storage (max 500)
       ↓
   Spilled to Disk (when memory full)
       ↓
   Periodic Cleanup (every 1s)
       ↓
   Move to archive/pending/ (when limits exceeded)
       ↓
   Archive to ZIP (batch size reached OR timeout exceeded)
       ↓
   Delete Pending Files
       ↓
   Cleanup Old Archives (every 24h, default retention 7 days)
   ```
   
   ### Code Changes
   
   **New Files:**
   - `ProfileArchiveManager.java` (682 lines) - Core archiving logic
   
   **Modified Files:**
   - `Config.java` (+35 lines) - Configuration parameters
   - `ProfileManager.java` (+157/-17 lines) - Integration with archive system
   
   **Test Files:**
   - `ProfileArchiveManagerTest.java` (+1111 lines) - 26 comprehensive test 
cases
   - `ProfileManagerTest.java` (+227 lines) - Integration tests
   
   
   ## Configuration
   
   All parameters have sensible defaults and can be tuned via FE configuration:
   
   | Parameter | Type | Default | Description |
   |-----------|------|---------|-------------|
   | `enable_profile_archive` | boolean | `true` | Enable/disable profile 
archiving |
   | `profile_archive_batch_size` | int | `100` | Number of profiles per ZIP 
file |
   | `profile_archive_path` | String | `""` | Custom archive path (empty = use 
default `${spilled_profile_storage_path}/archive`) |
   | `profile_archive_retention_seconds` | int | `604800` | Archive retention 
period in seconds (7 days). Set to `-1` for unlimited retention, `0` to disable 
archiving |
   | `profile_archive_pending_timeout_seconds` | int | `86400` | Maximum wait 
time for pending files in seconds (24 hours). Force archive even if batch is 
not full |
   
   ### Configuration Examples
   
   ```properties
   # Increase batch size for larger archives (reduces file count)
   profile_archive_batch_size = 1000
   
   # Keep archives for 30 days
   profile_archive_retention_seconds = 2592000
   
   # Use custom archive path (e.g., mounted network storage)
   profile_archive_path = /mnt/nfs/doris-profiles/archive
   
   # Force archive after 12 hours instead of 24
   profile_archive_pending_timeout_seconds = 43200
   
   # Disable archiving (keep current behavior)
   enable_profile_archive = false
   ```
   
   
   
   ## Usage
   
   ### For System Administrators
   
   **Step 1: Locate Slow Query**
   ```sql
   SELECT query_id, time, frontend_ip, query_time
   FROM __internal_schema.audit_log
   WHERE time >= NOW() - INTERVAL 1 DAY
     AND query_time > 10000
   ORDER BY query_time DESC;
   ```
   
   **Step 2: Find Archive File**
   ```bash
   ssh user@<frontend_ip>
   cd ${LOG_DIR}/profile/archive
   ls -lh profiles_*.zip
   ```
   
   **Step 3: Extract and Analyze**
   ```bash
   unzip profiles_20240101_120000_20240101_130000.zip -d /tmp/analysis/
   ls /tmp/analysis/ | grep <query_id>
   vim /tmp/analysis/<timestamp>_<query_id>.profile
   ```
   
   ### Space Management
   
   ```bash
   # Check archive storage usage
   du -sh ${LOG_DIR}/profile/archive
   
   # Manual cleanup (if needed beyond automatic retention)
   find ${LOG_DIR}/profile/archive -name "profiles_*.zip" -mtime +90 -delete
   ```
   
   
   
   ## Backward Compatibility
   
   - **Fully backward compatible** - existing profile storage continues to work
   - **Default enabled** - archives are created automatically
   - **Can be disabled** - set `enable_profile_archive = false` to restore old 
behavior
   - **No schema changes** - no database migration required
   
   
   
   
   
   
   ## Check List (For Author)
   
   - Test <!-- At least one of them must be included. -->
       - [ ] Regression test
       - [X] Unit Test
       - [ ] Manual test (add detailed scripts or steps below)
       - [ ] No need to test or manual test. Explain why:
           - [ ] This is a refactor/code format and no logic has been changed.
           - [ ] Previous test can cover this change.
           - [ ] No code files have been changed.
           - [ ] Other reason <!-- Add your reason?  -->
   
   - Behavior changed:
       - [ ] No.
       - [ ] Yes. <!-- Explain the behavior change -->
   
   - Does this need documentation?
       - [ ] No.
       - [ ] Yes. <!-- Add document PR link here. eg: 
https://github.com/apache/doris-website/pull/1214 -->
   
   ### Check List (For Reviewer who merge this PR)
   
   - [ ] Confirm the release note
   - [ ] Confirm test cases
   - [ ] Confirm document
   - [ ] Add branch pick label <!-- Add branch pick label that this PR should 
merge into -->
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to