[I] [Feature] Optimize Elasticsearch Ingestion: Replace Daily Indices with an ILM Rollover Alias [skywalking]

via GitHub Thu, 24 Jul 2025 18:58:54 -0700


chj9 opened a new issue, #13383:
URL: https://github.com/apache/skywalking/issues/13383


   ### Search before asking
   
   - [x] I had searched in the 
[issues](https://github.com/apache/skywalking/issues?q=is%3Aissue) and found no 
similar feature requirement.
   
   
   ### Description
   
   #### **1. Background**
   
   The current implementation writes data to Elasticsearch by creating a new 
index for each day (e.g., `skywalking_segment-20250724`). While this approach 
is straightforward for low data volumes, it presents significant challenges as 
the amount of data grows. When daily data volume is high, this strategy leads 
to massive single-day indices (potentially hundreds of gigabytes), causing 
severe issues:
   
     * **Degraded Query Performance:** Querying a massive index consumes 
substantial memory and CPU, resulting in slow queries or even timeouts. This 
negatively impacts user experience and data analysis efficiency.
     * **Unbalanced Shards:** Shards for high-volume days become excessively 
large, while shards for low-volume days remain small, leading to inefficient 
resource allocation.
     * **Complex Manual Management:** The application code or external cron 
jobs must handle the logic for creating and deleting indices, increasing code 
complexity and maintenance overhead.
   
   <img width="561" height="633" alt="Image" 
src="https://github.com/user-attachments/assets/0c281027-94d9-412f-92a1-59e34c28a6ed";
 />
   
   #### **2. Proposed Solution**
   
   We propose migrating from the daily index pattern to a strategy that 
leverages Elasticsearch's built-in **Index Lifecycle Management (ILM)** 
combined with a **Rollover Alias**.
   
   The core concept of this strategy is:
   
     * **Write to a single, fixed alias** (e.g., `skywalking_segment`). Both 
writes and queries will target this alias.
     * **Automate index management with an ILM policy.** When an index meets a 
defined condition (e.g., its size reaches `15GB` or its age reaches `2d`), ILM 
automatically creates a new index and seamlessly switches the write alias 
(`is_write_index: true`) to it.
     * **Automate data retention.** The ILM policy will also automatically 
handle the lifecycle of old data, such as deleting it after 7 days, without any 
external intervention.
   
   #### **3. Implementation Steps**
   
   The complete implementation involves the following four key steps:
   
   **Step 1: Create an ILM Policy**
   
   Define a policy that specifies the conditions for the rollover and delete 
actions.
   
   ```json
   PUT _ilm/policy/skywalking_segment_ilm_policy
   {
     "policy": {
       "phases": {
         "hot": {
           "min_age": "0ms",
           "actions": {
             "rollover": {
               "max_primary_shard_size": "15gb",
               "max_age": "2d"
             }
           }
         },
         "delete": {
           "min_age": "7d",
           "actions": {
             "delete": {}
           }
         }
       }
     }
   }
   ```
   
     * **Explanation:** A rollover is triggered when a primary shard reaches 
`15GB` or the index is `2` days old. The data will be automatically deleted 
after `7` days.
   
   **Step 2: Create an Index Template**
   
   Create a template to automatically apply the ILM policy and settings to all 
new indices matching the `skywalking_segment-*` pattern.
   
   ```json
   PUT _index_template/skywalking_segment_template
   {
     "index_patterns": [
       "skywalking_segment-*"
     ],
     "template": {
       "settings": {
         "index": {
           "refresh_interval": "5s",
           "number_of_shards": "5",
           "number_of_replicas": "0",
           "lifecycle": {
             "name": "skywalking_segment_ilm_policy",
             "rollover_alias": "skywalking_segment" 
           }
         }
       },
       "mappings": {
         "properties": {
           "message": {
             "type": "text"
           }
         }
       }
     }
   }
   ```
   
     * **Explanation:** Any new index with a name starting with 
`skywalking_segment-` will be associated with the 
`skywalking_segment_ilm_policy` and use `skywalking_segment` as its rollover 
alias.
   
   **Step 3: Create the Bootstrap Index**
   
   Manually create the very first index and assign the alias to it, explicitly 
marking it as the write index.
   
   ```json
   PUT skywalking_segment-000001
   {
     "aliases": {
       "skywalking_segment": {
         "is_write_index": true
       }
     }
   }
   ```
   
     * **Explanation:** This is the "seed" index to start the process. All 
subsequent indices (`skywalking_segment-000002`, `skywalking_segment-000003`, 
etc.) will be created and managed automatically by ILM.
   
   **Step 4: Modify Application Code**
   
   This is the most critical change. All logic in the application code that 
writes to and queries from Elasticsearch must be updated:
   
     * **Write Operations:** The target destination should be changed from a 
dynamic, date-based index name (e.g., `skywalking_segment-20250724`) to the 
**fixed alias** `skywalking_segment`.
     * **Query Operations:** The query target should also be unified to the 
alias `skywalking_segment`. Since the alias points to all relevant active 
indices (e.g., `skywalking_segment-000001`, `skywalking_segment-000002`), 
querying the alias will search across all necessary data.
   
   #### **4. Advantages**
   
   Adopting this solution will yield significant benefits:
   
   1.  **Automated Lifecycle Management:** Eliminates the need for complex 
index creation/deletion logic in the application code, handing over 
responsibility to Elasticsearch and reducing maintenance costs.
   2.  **Balanced Shard Sizes:** By controlling shard size with 
`max_primary_shard_size`, we ensure that every shard remains within a healthy 
and efficient size range, preventing giant shards.
   3.  **Improved Query Performance:** Smaller, well-balanced shards lead to 
faster query speeds and more stable performance.
   4.  **Simplified Application Logic:** The application code is decoupled from 
physical index names and timing concerns; it only needs to interact with a 
fixed alias.
   5.  **Seamless Index Rollover:** The rollover action is atomic, allowing 
write traffic to transition smoothly from an old index to a new one with no 
data loss or service interruption.
   
   #### **5. Potential Impact**
   
     * **Data Migration:** A strategy will be needed to manage existing daily 
indices. They can be added to a separate ILM policy that only contains a delete 
phase, or they can be removed manually after they expire.
     * **Configuration Changes:** The project's configuration files will need 
to be updated, replacing the old index prefix (e.g., 
`skywalking_segment-20250724`) with the new write alias (e.g., 
`skywalking_segment`).
   
   **Conclusion:**
   This optimization is a critical step to ensure the system remains performant 
and highly available as data volumes continue to scale. We strongly recommend 
that the core development team evaluate and adopt this proposal.
   
   
   ### Use case
   
   Data Storage, Logging Module, Elasticsearch Integration
   
   ### Related issues
   
   Optimize storage
   
   ### Are you willing to submit a pull request to implement this on your own?
   
   - [ ] Yes I am willing to submit a pull request on my own!
   
   ### Code of Conduct
   
   - [x] I agree to follow this project's [Code of 
Conduct](https://www.apache.org/foundation/policies/conduct)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: 
notifications-unsubscr...@skywalking.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[I] [Feature] Optimize Elasticsearch Ingestion: Replace Daily Indices with an ILM Rollover Alias [skywalking]

Reply via email to