[ https://issues.apache.org/jira/browse/IGNITE-12080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16911630#comment-16911630 ]
Maxim Muzafarov commented on IGNITE-12080: ------------------------------------------ Folks, I've looked through the changes and have a few questions regarding implementation. * Why the static utils class is used for collecting rebalance info? Why not, for instance, the DiagnosticProcossor (introduced recently)? * After IGNITE-3195 will be merged there is no reason to collect statistics about `rebalance topic` it will be replaced with the thread pools. * Do you have any benchmarks with the `printRebalanceStatistics` property enabled? Since the rebalance procedure can be run 4-8 hours it is necessary to check and analyze JVM metrics (GC, used heap etc.) We can have thousands of Supply-Demand messages and for each, we are holding in the heap a `RebalanceMessageStatistics` until the rebalance procedure finishes. * Printed statistics are not in the human-readable format. Is it user-friendly? Moreover, it is up to the implementation to print statistics the right way in logs. I think we don't need any abbreviations (e.g. `writeAliasesRebalanceStatistics`) to decode logs. * Do we have TC execution with `printRebalanceStatistics` enabled property on all suites? It seems to me we can get a `NullPointerException` for some of the cases. * Why the `RebalanceMessageStatistics` is needed? I don't think that holding `sndMsgTime` for each message will be useful for rebalancing statistic at all. The same thing for `rcvMsgTime`. * I think `ReceivePartitionStatistics`.`msgSize` will be the same for 98% cases. Do we need it? * Do we need `PartitionStatistics` at all? Can the same value be obtained from metrics `onRebalanceKeyReceived` and the end of the rebalance procedure? Please, do not merge PR until all the issues will be resolved. > Add extended logging for rebalance > ---------------------------------- > > Key: IGNITE-12080 > URL: https://issues.apache.org/jira/browse/IGNITE-12080 > Project: Ignite > Issue Type: Improvement > Reporter: Kirill Tkalenko > Assignee: Kirill Tkalenko > Priority: Major > Time Spent: 50m > Remaining Estimate: 0h > > We should log all information about finished rebalance on demander node. > I'd have in log: > h3. Total information: > # Rebalance duration, rebalance start time/rebalance finish time > # How many partitions were processed in each topic (number of paritions, > number of entries, number of bytes) > # How many nodes were suppliers in rebalance (nodeId, number of supplied > paritions, number of supplied entries, number of bytes, duraton of getting > and processing partitions from supplier) > h3. Information per cache group: > # Rebalance duration, rebalance start time/rebalance finish time > # How many partitions were processed in each topic (number of paritions, > number of entries, number of bytes) > # How many nodes were suppliers in rebalance (nodeId, number of supplied > paritions, list of partition ids with PRIMARY/BACKUP flag, number of supplied > entries, number of bytes, duraton of getting and processing partitions from > supplier) > # Information about each partition distribution (list of nodeIds with > primary/backup flag and marked supplier nodeId) > h3. Information per supplier node: > # How many paritions were requested: > #* Total number > #* Primary/backup distribution (number of primary partitions, number of > backup partitions) > #* Total number of entries > #* Total size partitions in bytes > # How many paritions were requested per cache group: > #* Number of requested partitions > #* Number of entries in partitions > #* Total size of partitions in bytes > #* List of requested partitions with size in bytes, count entries, primary or > backup partition flag -- This message was sent by Atlassian Jira (v8.3.2#803003)