kevinrr888 commented on issue #4498:
URL: https://github.com/apache/accumulo/issues/4498#issuecomment-2101224458

   It seems like the METRICS are logged every second (configured by 
`cfg.setProperty("general.custom.metrics.opts.logging.step", "1s");` in 
MetricsIT) but only updated every 5 seconds (configured by `private static 
final long DEFAULT_MIN_REFRESH_DELAY = TimeUnit.SECONDS.toMillis(5);` in 
FateMetrics). This makes it look like the METRICS values are wrong at the time 
they are logged in the MetricsIT. Since this is a fast test, increasing the 
update interval to every second, we can see values other than 0. Here's one set 
of the logged metrics for one cycle of logs showing a submitted transaction:
   
   ```
   2024-05-07T16:05:07,695 [accumulo.METRICS] INFO : 
accumulo.fate.errors{host=groot,instance.name=miniInstance,port=35101,process.name=manager,type=zk.connection}
 value=0
   2024-05-07T16:05:07,695 [accumulo.METRICS] INFO : 
accumulo.fate.ops.activity{host=groot,instance.name=miniInstance,port=35101,process.name=manager}
 value=1
   2024-05-07T16:05:07,695 [accumulo.METRICS] INFO : 
accumulo.fate.ops.in.progress{host=groot,instance.name=miniInstance,port=35101,process.name=manager}
 value=1
   2024-05-07T16:05:07,696 [accumulo.METRICS] INFO : 
accumulo.fate.tx{host=groot,instance.name=miniInstance,port=35101,process.name=manager,state=failed_in_progress}
 value=0
   2024-05-07T16:05:07,696 [accumulo.METRICS] INFO : 
accumulo.fate.tx{host=groot,instance.name=miniInstance,port=35101,process.name=manager,state=in_progress}
 value=0
   2024-05-07T16:05:07,696 [accumulo.METRICS] INFO : 
accumulo.fate.tx{host=groot,instance.name=miniInstance,port=35101,process.name=manager,state=new}
 value=0
   2024-05-07T16:05:07,697 [accumulo.METRICS] INFO : 
accumulo.fate.tx{host=groot,instance.name=miniInstance,port=35101,process.name=manager,state=unknown}
 value=0
   2024-05-07T16:05:07,697 [accumulo.METRICS] INFO : 
accumulo.fate.tx{host=groot,instance.name=miniInstance,port=35101,process.name=manager,state=successful}
 value=0
   2024-05-07T16:05:07,697 [accumulo.METRICS] INFO : 
accumulo.fate.tx{host=groot,instance.name=miniInstance,port=35101,process.name=manager,state=failed}
 value=0
   2024-05-07T16:05:07,697 [accumulo.METRICS] INFO : 
accumulo.fate.tx{host=groot,instance.name=miniInstance,port=35101,process.name=manager,state=submitted}
 value=1
   ```
   
   So, it appears that the values of 0 are correct, they are just not updated 
as frequently as they are logged, making it seem like the count of 0 is more 
frequent than it actually is. Perhaps for this test we could increase the 
logging time to 5 seconds, or decrease the update time to 1 second so it is 
more clear in the logs what is actually happening at what time.
   
   One thing I noticed when looking at this, `accumulo.fate.ops.in.progress` is 
a sum of all current txns (failed_in_progress, in_progress, new, unknown, 
successful, failed, and submitted). The "in.progress" may be a bit misleading 
since the "in.progress" in this case is different from the state "in_progress". 
Maybe something like `accumulo.fate.ops.total` or 
`accumulo.fate.ops.current.ops` would be clearer? Or maybe this was intended to 
only include txns that are in_progress? This doesn't seem to apply to 
`accumulo.fate.ops.in.progress.by.type` since this only includes those that are 
in_progress.
   
   Another thing I noticed was it seems like the 
`accumulo.fate.ops.in.progress.by.type` metric is not reset from updates. For 
example, the following two cycles of logs are 1 second apart from each other:
   
   ```
   2024-05-08T10:32:56,013 [accumulo.METRICS] INFO : 
accumulo.fate.errors{host=groot,instance.name=miniInstance,port=34595,process.name=manager,type=zk.connection}
 value=0
   2024-05-08T10:32:56,013 [accumulo.METRICS] INFO : 
accumulo.fate.ops.activity{host=groot,instance.name=miniInstance,port=34595,process.name=manager}
 value=3
   2024-05-08T10:32:56,013 [accumulo.METRICS] INFO : 
accumulo.fate.ops.in.progress{host=groot,instance.name=miniInstance,port=34595,process.name=manager}
 value=1
   2024-05-08T10:32:56,013 [accumulo.METRICS] INFO : 
accumulo.fate.ops.in.progress.by.type{host=groot,instance.name=miniInstance,op.type=TABLE_COMPACT,port=34595,process.name=manager}
 value=1
   2024-05-08T10:32:56,013 [accumulo.METRICS] INFO : 
accumulo.fate.tx{host=groot,instance.name=miniInstance,port=34595,process.name=manager,state=unknown}
 value=0
   2024-05-08T10:32:56,013 [accumulo.METRICS] INFO : 
accumulo.fate.tx{host=groot,instance.name=miniInstance,port=34595,process.name=manager,state=successful}
 value=0
   2024-05-08T10:32:56,013 [accumulo.METRICS] INFO : 
accumulo.fate.tx{host=groot,instance.name=miniInstance,port=34595,process.name=manager,state=failed}
 value=0
   2024-05-08T10:32:56,013 [accumulo.METRICS] INFO : 
accumulo.fate.tx{host=groot,instance.name=miniInstance,port=34595,process.name=manager,state=submitted}
 value=0
   2024-05-08T10:32:56,013 [accumulo.METRICS] INFO : 
accumulo.fate.tx{host=groot,instance.name=miniInstance,port=34595,process.name=manager,state=failed_in_progress}
 value=0
   2024-05-08T10:32:56,013 [accumulo.METRICS] INFO : 
accumulo.fate.tx{host=groot,instance.name=miniInstance,port=34595,process.name=manager,state=new}
 value=0
   2024-05-08T10:32:56,013 [accumulo.METRICS] INFO : 
accumulo.fate.tx{host=groot,instance.name=miniInstance,port=34595,process.name=manager,state=in_progress}
 value=1
   
   2024-05-08T10:32:57,014 [accumulo.METRICS] INFO : 
accumulo.fate.errors{host=groot,instance.name=miniInstance,port=34595,process.name=manager,type=zk.connection}
 value=0
   2024-05-08T10:32:57,014 [accumulo.METRICS] INFO : 
accumulo.fate.ops.activity{host=groot,instance.name=miniInstance,port=34595,process.name=manager}
 value=4
   2024-05-08T10:32:57,014 [accumulo.METRICS] INFO : 
accumulo.fate.ops.in.progress{host=groot,instance.name=miniInstance,port=34595,process.name=manager}
 value=0
   2024-05-08T10:32:57,014 [accumulo.METRICS] INFO : 
accumulo.fate.ops.in.progress.by.type{host=groot,instance.name=miniInstance,op.type=TABLE_COMPACT,port=34595,process.name=manager}
 value=1
   2024-05-08T10:32:57,014 [accumulo.METRICS] INFO : 
accumulo.fate.tx{host=groot,instance.name=miniInstance,port=34595,process.name=manager,state=unknown}
 value=0
   2024-05-08T10:32:57,014 [accumulo.METRICS] INFO : 
accumulo.fate.tx{host=groot,instance.name=miniInstance,port=34595,process.name=manager,state=successful}
 value=0
   2024-05-08T10:32:57,014 [accumulo.METRICS] INFO : 
accumulo.fate.tx{host=groot,instance.name=miniInstance,port=34595,process.name=manager,state=failed}
 value=0
   2024-05-08T10:32:57,014 [accumulo.METRICS] INFO : 
accumulo.fate.tx{host=groot,instance.name=miniInstance,port=34595,process.name=manager,state=submitted}
 value=0
   2024-05-08T10:32:57,014 [accumulo.METRICS] INFO : 
accumulo.fate.tx{host=groot,instance.name=miniInstance,port=34595,process.name=manager,state=failed_in_progress}
 value=0
   2024-05-08T10:32:57,014 [accumulo.METRICS] INFO : 
accumulo.fate.tx{host=groot,instance.name=miniInstance,port=34595,process.name=manager,state=new}
 value=0
   2024-05-08T10:32:57,014 [accumulo.METRICS] INFO : 
accumulo.fate.tx{host=groot,instance.name=miniInstance,port=34595,process.name=manager,state=in_progress}
 value=0
   ```
   
   In the first set of logs, 1 transaction is in_progress on TABLE_COMPACT, in 
the second set of logs, 0 transactions are in_progress but 1 is in progress 
doing op TABLE_COMPACT which doesn't make sense. Looking at the code and after 
some experimentation, it doesn't seem like this is an issue with seeing that 
transaction still in_progress in ZK, instead seems like it is just being logged 
again. I can't exactly narrow down why it's being logged again, I think someone 
more familiar with Metrics might be able to better understand.
   
   And one question, I noticed that these metrics are only obtained by looking 
at FATE transactions in ZK in elasticity (`List<AdminUtil.TransactionStatus> 
currFates = admin.getTransactionStatus(Map.of(FateInstanceType.META, 
metaFateStore), null, null, null);` in FateMetricValues). Should this also be 
looking at the UserFateStore/Accumulo FATE table for transactions?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to