DomGarguilo commented on issue #4815:
URL: https://github.com/apache/accumulo/issues/4815#issuecomment-2313386004

   I am trying to think of what the best solution here is. For the amount of 
metrics that we have, it seems like a table might get too big (have to scroll 
back up to check the column titles). 
   
   Another option would be to have just a "list" of the metrics each with its 
own header and have the info about each one under that. That way we could also 
break the metrics up into sections like `general server metrics` or `compactor 
metrics` like we do now with the comments. This might make things more readable.
   
   Here is what that might look like as an example:
   <details>
     <summary><strong>Click to expand</strong></summary>
   
   # General Server Metrics
   
   ## `metrics.server.idle`
   - **Type:** Gauge  
   - **Description:**  
     Indicates if the server is idle (1 = idle, 0 = not idle).  
   - **Related Properties:**  
     - `accumulo.server.idle.timeout`: Influences how long the server waits 
before becoming idle. Longer timeouts lead to longer periods of non-idleness.
   - **Conditions to Monitor:**  
     - Extended periods of idleness during high usage might indicate issues.
   - **Recommended Actions:**  
     - Review system logs for potential issues or unexpected activity.
   
   # Compactor Metrics
   
   ## `metrics.compactor.majc.stuck`
   - **Type:** LongTaskTimer  
   - **Description:**  
     Tracks the duration of major compaction tasks that get stuck.  
   - **Related Properties:**  
     - `accumulo.compactor.max.running.tasks`: Influences how many compaction 
tasks can run concurrently. A higher value could increase the chance of 
compactions getting stuck under resource pressure.
   - **Conditions to Monitor:**  
     - Long task durations without completion may indicate resource contention, 
particularly with disk I/O.
   - **Recommended Actions:**  
     - Check disk usage and resource allocation. High load systems may require 
tuning.
   
   ## `metrics.compactor.entries.read`
   - **Type:** FunctionCounter  
   - **Description:**  
     Counts the number of entries read by all threads performing compactions.  
   - **Related Properties:**  
     - `accumulo.compactor.threadpool.size`: Affects how quickly entries can be 
read. A larger thread pool can speed up the reading process but may consume 
more system resources.
   - **Conditions to Monitor:**  
     - Low read count during periods of expected high compaction activity.
   - **Recommended Actions:**  
     - Ensure that the compactor thread pool is properly sized for the workload.
   
   # Fate Metrics
   
   ## Metric: `metrics.fate.ops`
   - **Type:** Gauge  
   - **Description:**  
     Tracks the number of current Fate operations in any state.  
   - **Related Properties:**  
     - `accumulo.fate.max.transactions`: Limits the number of concurrent Fate 
operations. Higher limits allow for more transactions but may also increase the 
risk of contention or failure under high load.
   - **Conditions to Monitor:**  
     - High number of operations in progress could signal stuck or delayed 
transactions.
   - **Recommended Actions:**  
     - Investigate if operations are stuck or taking too long to complete.
   </details>
   
   I am not sure the best way to structure the raw text for this list though. 
It may end up that having it in a table for development might work better and 
then that is rendered out to this list. Not too sure.
   
   I am interested in hearing others thoughts on the table vs. list argument 
though.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to