J-HowHuang opened a new pull request, #16136:
URL: https://github.com/apache/pinot/pull/16136

   # Enhanced Tenant Rebalance Result with Aggregated Summary
   
   ## Summary
   
   This PR enhances the tenant rebalance API to provide aggregated summaries 
across all tables within a tenant, giving users a comprehensive view of the 
rebalance operation's impact at the tenant level. Previously, the API only 
returned individual table rebalance results, making it difficult to understand 
the overall tenant-level changes.
   
   ## Changes
   
   ### ๐Ÿ”ง Core Enhancements
   
   1. **Enhanced `TenantRebalanceResult`** - Added aggregated summary fields:
      - `totalTables`: Total number of tables processed
      - `statusSummary`: Count of tables by status (DONE, NO_OP, FAILED, etc.)
      - `aggregatedPreChecksResult`: Aggregated pre-check results across all 
tables
      - `aggregatedRebalanceSummary`: Consolidated rebalance summary with 
server and segment information
   
   2. **Smart Aggregation Logic** - Implements sophisticated aggregation of 
overlapping resources:
      - Handles servers that appear in multiple tables
      - Correctly aggregates segment counts and data movement estimates
      - Preserves individual table details while providing tenant-level insights
   
   3. **Comprehensive Test Coverage** - Added extensive tests covering:
      - Basic aggregation scenarios
      - Complex overlapping server scenarios  
      - Edge cases and validation
   
   ### ๐Ÿ“Š New Output Format
   
   The enhanced API now returns both individual table results and aggregated 
tenant-level summaries:
   
   ```json
   {
     "jobId": "tenant-rebalance-12345",
     "totalTables": 3,
     "statusSummary": {
       "DONE": 2,
       "NO_OP": 1
     },
     "aggregatedRebalanceSummary": {
       "serverInfo": {
         "numServersGettingNewSegments": 4,
         "numServers": {
           "valueBeforeRebalance": 9,
           "expectedValueAfterRebalance": 10
         },
         "serversAdded": ["server4", "server5"],
         "serversRemoved": ["server6"],
         "serversUnchanged": ["server1", "server2", "server3", "server7", 
"server8"],
         "serversGettingNewSegments": ["server4", "server5", "server7", 
"server8"],
         "serverSegmentChangeInfo": {
           "server1": {
             "serverStatus": "UNCHANGED",
             "totalSegmentsBeforeRebalance": 10,
             "totalSegmentsAfterRebalance": 10,
             "segmentsAdded": 0,
             "segmentsDeleted": 0,
             "segmentsUnchanged": 10
           }
           // ... more servers
         }
       },
       "segmentInfo": {
         "totalSegmentsToBeMoved": 18,
         "totalSegmentsToBeDeleted": 8,
         "maxSegmentsAddedToASingleServer": 5,
         "estimatedAverageSegmentSizeInBytes": 1444,
         "totalEstimatedDataToBeMovedInBytes": 26000,
         "numSegmentsInSingleReplica": {
           "valueBeforeRebalance": 32,
           "expectedValueAfterRebalance": 32
         },
         "numSegmentsAcrossAllReplicas": {
           "valueBeforeRebalance": 74,
           "expectedValueAfterRebalance": 84
         }
       },
       "tagsInfo": [
         {
           "tagName": "TestTenant_OFFLINE",
           "numSegmentsToDownload": 10,
           "numSegmentsUnchanged": 42,
           "numServerParticipants": 7
         },
         {
           "tagName": "TestTenant_REALTIME", 
           "numSegmentsToDownload": 8,
           "numSegmentsUnchanged": 24,
           "numServerParticipants": 3
         }
       ]
     },
     "rebalanceTableResults": {
       "tableA_OFFLINE": {
         "jobId": "tableA_job",
         "status": "DONE",
         "description": "Table A rebalanced successfully"
         // ... individual table details
       }
       // ... more tables
     }
   }
   ```
   
   ## ๐Ÿ—๏ธ Special Aggregation Rules
   
   Several fields use **derived aggregation** logic instead of simple summation:
   
   ### 1. **Server Status Aggregation**
   - When a server appears in multiple tables with different statuses, the most 
significant status takes precedence:
     - `REMOVED` > `ADDED` > `UNCHANGED`
   - Example: If server1 is `UNCHANGED` in tableA but `REMOVED` from tableB, 
its aggregated status becomes `REMOVED`
   
   ### 2. **Max Segments Added to Single Server**
   - **Derived from aggregated server data** rather than table-level maximums
   - Calculates the maximum across all servers' total segments added (summed 
across all their tables)
   - More accurate than taking the max of individual table maximums
   
   ### 3. **Average Segment Size Calculation**
   - **Weighted average** based on total segments moved and total data moved
   - Formula: `totalEstimatedDataToBeMovedInBytes / totalSegmentsToBeMoved`
   - Provides accurate tenant-level average, not simple arithmetic mean of 
table averages
   
   ### 4. **Server Count Calculations**
   - Uses **set operations** to handle overlapping servers between tables
   - `numServers.valueBeforeRebalance`: Count of unique servers with segments 
before rebalance
   - `numServers.expectedValueAfterRebalance`: Count of unique servers with 
segments after rebalance
   
   ### 5. **Tag Participant Counting**
   - Aggregates unique servers across all tables using the same tag
   - Accounts for servers that may have been removed from some tables but still 
exist in others
   
   ## ๐Ÿงช Testing
   
   Added comprehensive test coverage including:
   
   1. **`testTenantRebalanceResultAggregation()`**
      - Basic aggregation across different table types (offline/realtime)
      - Scale-out, scale-in, and no-op scenarios
      - Status summary validation
   
   2. **`testTenantRebalanceResultAggregationWithOverlappingServers()`**
      - Complex scenario with servers appearing in multiple tables
      - Validates correct handling of server status precedence
      - Tests derived calculations with overlapping resources
   
   ## ๐Ÿ”„ Backward Compatibility
   
   - **Fully backward compatible** - existing fields remain unchanged
   - New aggregated fields are **optional additions**
   - `verboseResult` parameter controls inclusion of individual table details
   - Existing API consumers continue to work without modification
   
   ## ๐Ÿ“ˆ Benefits
   
   1. **Operational Visibility**: Clear tenant-level view of rebalance impact
   2. **Resource Planning**: Understand total data movement and server changes
   3. **Monitoring**: Easy status tracking across multiple tables
   4. **Performance Insights**: Aggregated metrics for capacity planning
   
   ## ๐ŸŽฏ Use Cases
   
   - **Tenant Scaling**: Understand impact when adding/removing servers to a 
tenant
   - **Capacity Planning**: Estimate data movement and resource requirements  
   - **Operations Dashboard**: Monitor tenant rebalance status and progress
   - **Automated Tooling**: Build automation based on aggregated metrics
   
   ## API Endpoint
   
   ```
   POST /tenants/{tenantName}/rebalance
   ```
   
   Query Parameters:
   - `verboseResult=true` - Include individual table details (default: false)
   - All existing rebalance parameters remain supported
   
   ---
   
   This enhancement provides crucial tenant-level insights while maintaining 
full backward compatibility with existing integrations.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to