kevinrr888 commented on issue #4498:
URL: https://github.com/apache/accumulo/issues/4498#issuecomment-2101224458
It seems like the METRICS are logged every second (configured by
`cfg.setProperty("general.custom.metrics.opts.logging.step", "1s");` in
MetricsIT) but only updated every 5 seconds (configured by `private static
final long DEFAULT_MIN_REFRESH_DELAY = TimeUnit.SECONDS.toMillis(5);` in
FateMetrics). This makes it look like the METRICS values are wrong at the time
they are logged in the MetricsIT. Since this is a fast test, increasing the
update interval to every second, we can see values other than 0. Here's one set
of the logged metrics for one cycle of logs showing a submitted transaction:
```
2024-05-07T16:05:07,695 [accumulo.METRICS] INFO :
accumulo.fate.errors{host=groot,instance.name=miniInstance,port=35101,process.name=manager,type=zk.connection}
value=0
2024-05-07T16:05:07,695 [accumulo.METRICS] INFO :
accumulo.fate.ops.activity{host=groot,instance.name=miniInstance,port=35101,process.name=manager}
value=1
2024-05-07T16:05:07,695 [accumulo.METRICS] INFO :
accumulo.fate.ops.in.progress{host=groot,instance.name=miniInstance,port=35101,process.name=manager}
value=1
2024-05-07T16:05:07,696 [accumulo.METRICS] INFO :
accumulo.fate.tx{host=groot,instance.name=miniInstance,port=35101,process.name=manager,state=failed_in_progress}
value=0
2024-05-07T16:05:07,696 [accumulo.METRICS] INFO :
accumulo.fate.tx{host=groot,instance.name=miniInstance,port=35101,process.name=manager,state=in_progress}
value=0
2024-05-07T16:05:07,696 [accumulo.METRICS] INFO :
accumulo.fate.tx{host=groot,instance.name=miniInstance,port=35101,process.name=manager,state=new}
value=0
2024-05-07T16:05:07,697 [accumulo.METRICS] INFO :
accumulo.fate.tx{host=groot,instance.name=miniInstance,port=35101,process.name=manager,state=unknown}
value=0
2024-05-07T16:05:07,697 [accumulo.METRICS] INFO :
accumulo.fate.tx{host=groot,instance.name=miniInstance,port=35101,process.name=manager,state=successful}
value=0
2024-05-07T16:05:07,697 [accumulo.METRICS] INFO :
accumulo.fate.tx{host=groot,instance.name=miniInstance,port=35101,process.name=manager,state=failed}
value=0
2024-05-07T16:05:07,697 [accumulo.METRICS] INFO :
accumulo.fate.tx{host=groot,instance.name=miniInstance,port=35101,process.name=manager,state=submitted}
value=1
```
So, it appears that the values of 0 are correct, they are just not updated
as frequently as they are logged, making it seem like the count of 0 is more
frequent than it actually is. Perhaps for this test we could increase the
logging time to 5 seconds, or decrease the update time to 1 second so it is
more clear in the logs what is actually happening at what time.
One thing I noticed when looking at this, `accumulo.fate.ops.in.progress` is
a sum of all current txns (failed_in_progress, in_progress, new, unknown,
successful, failed, and submitted). The "in.progress" may be a bit misleading
since the "in.progress" in this case is different from the state "in_progress".
Maybe something like `accumulo.fate.ops.total` or
`accumulo.fate.ops.current.ops` would be clearer? Or maybe this was intended to
only include txns that are in_progress? This doesn't seem to apply to
`accumulo.fate.ops.in.progress.by.type` since this only includes those that are
in_progress.
Another thing I noticed was it seems like the
`accumulo.fate.ops.in.progress.by.type` metric is not reset from updates. For
example, the following two cycles of logs are 1 second apart from each other:
```
2024-05-08T10:32:56,013 [accumulo.METRICS] INFO :
accumulo.fate.errors{host=groot,instance.name=miniInstance,port=34595,process.name=manager,type=zk.connection}
value=0
2024-05-08T10:32:56,013 [accumulo.METRICS] INFO :
accumulo.fate.ops.activity{host=groot,instance.name=miniInstance,port=34595,process.name=manager}
value=3
2024-05-08T10:32:56,013 [accumulo.METRICS] INFO :
accumulo.fate.ops.in.progress{host=groot,instance.name=miniInstance,port=34595,process.name=manager}
value=1
2024-05-08T10:32:56,013 [accumulo.METRICS] INFO :
accumulo.fate.ops.in.progress.by.type{host=groot,instance.name=miniInstance,op.type=TABLE_COMPACT,port=34595,process.name=manager}
value=1
2024-05-08T10:32:56,013 [accumulo.METRICS] INFO :
accumulo.fate.tx{host=groot,instance.name=miniInstance,port=34595,process.name=manager,state=unknown}
value=0
2024-05-08T10:32:56,013 [accumulo.METRICS] INFO :
accumulo.fate.tx{host=groot,instance.name=miniInstance,port=34595,process.name=manager,state=successful}
value=0
2024-05-08T10:32:56,013 [accumulo.METRICS] INFO :
accumulo.fate.tx{host=groot,instance.name=miniInstance,port=34595,process.name=manager,state=failed}
value=0
2024-05-08T10:32:56,013 [accumulo.METRICS] INFO :
accumulo.fate.tx{host=groot,instance.name=miniInstance,port=34595,process.name=manager,state=submitted}
value=0
2024-05-08T10:32:56,013 [accumulo.METRICS] INFO :
accumulo.fate.tx{host=groot,instance.name=miniInstance,port=34595,process.name=manager,state=failed_in_progress}
value=0
2024-05-08T10:32:56,013 [accumulo.METRICS] INFO :
accumulo.fate.tx{host=groot,instance.name=miniInstance,port=34595,process.name=manager,state=new}
value=0
2024-05-08T10:32:56,013 [accumulo.METRICS] INFO :
accumulo.fate.tx{host=groot,instance.name=miniInstance,port=34595,process.name=manager,state=in_progress}
value=1
2024-05-08T10:32:57,014 [accumulo.METRICS] INFO :
accumulo.fate.errors{host=groot,instance.name=miniInstance,port=34595,process.name=manager,type=zk.connection}
value=0
2024-05-08T10:32:57,014 [accumulo.METRICS] INFO :
accumulo.fate.ops.activity{host=groot,instance.name=miniInstance,port=34595,process.name=manager}
value=4
2024-05-08T10:32:57,014 [accumulo.METRICS] INFO :
accumulo.fate.ops.in.progress{host=groot,instance.name=miniInstance,port=34595,process.name=manager}
value=0
2024-05-08T10:32:57,014 [accumulo.METRICS] INFO :
accumulo.fate.ops.in.progress.by.type{host=groot,instance.name=miniInstance,op.type=TABLE_COMPACT,port=34595,process.name=manager}
value=1
2024-05-08T10:32:57,014 [accumulo.METRICS] INFO :
accumulo.fate.tx{host=groot,instance.name=miniInstance,port=34595,process.name=manager,state=unknown}
value=0
2024-05-08T10:32:57,014 [accumulo.METRICS] INFO :
accumulo.fate.tx{host=groot,instance.name=miniInstance,port=34595,process.name=manager,state=successful}
value=0
2024-05-08T10:32:57,014 [accumulo.METRICS] INFO :
accumulo.fate.tx{host=groot,instance.name=miniInstance,port=34595,process.name=manager,state=failed}
value=0
2024-05-08T10:32:57,014 [accumulo.METRICS] INFO :
accumulo.fate.tx{host=groot,instance.name=miniInstance,port=34595,process.name=manager,state=submitted}
value=0
2024-05-08T10:32:57,014 [accumulo.METRICS] INFO :
accumulo.fate.tx{host=groot,instance.name=miniInstance,port=34595,process.name=manager,state=failed_in_progress}
value=0
2024-05-08T10:32:57,014 [accumulo.METRICS] INFO :
accumulo.fate.tx{host=groot,instance.name=miniInstance,port=34595,process.name=manager,state=new}
value=0
2024-05-08T10:32:57,014 [accumulo.METRICS] INFO :
accumulo.fate.tx{host=groot,instance.name=miniInstance,port=34595,process.name=manager,state=in_progress}
value=0
```
In the first set of logs, 1 transaction is in_progress on TABLE_COMPACT, in
the second set of logs, 0 transactions are in_progress but 1 is in progress
doing op TABLE_COMPACT which doesn't make sense. Looking at the code and after
some experimentation, it doesn't seem like this is an issue with seeing that
transaction still in_progress in ZK, instead seems like it is just being logged
again. I can't exactly narrow down why it's being logged again, I think someone
more familiar with Metrics might be able to better understand.
And one question, I noticed that these metrics are only obtained by looking
at FATE transactions in ZK in elasticity (`List<AdminUtil.TransactionStatus>
currFates = admin.getTransactionStatus(Map.of(FateInstanceType.META,
metaFateStore), null, null, null);` in FateMetricValues). Should this also be
looking at the UserFateStore/Accumulo FATE table for transactions?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]