[ 
https://issues.apache.org/jira/browse/IGNITE-12080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16911630#comment-16911630
 ] 

Maxim Muzafarov commented on IGNITE-12080:
------------------------------------------

Folks,

I've looked through the changes and have a few questions regarding 
implementation.
 * Why the static utils class is used for collecting rebalance info? Why not, 
for instance, the DiagnosticProcossor (introduced recently)?
 * After IGNITE-3195 will be merged there is no reason to collect statistics 
about `rebalance topic` it will be replaced with the thread pools.
 * Do you have any benchmarks with the `printRebalanceStatistics` property 
enabled? Since the rebalance procedure can be run 4-8 hours it is necessary to 
check and analyze JVM metrics (GC, used heap etc.) We can have thousands of 
Supply-Demand messages and for each, we are holding in the heap a 
`RebalanceMessageStatistics` until the rebalance procedure finishes.
 * Printed statistics are not in the human-readable format. Is it 
user-friendly? Moreover, it is up to the implementation to print statistics the 
right way in logs. I think we don't need any abbreviations (e.g. 
`writeAliasesRebalanceStatistics`) to decode logs.
 * Do we have TC execution with `printRebalanceStatistics` enabled property on 
all suites? It seems to me we can get a `NullPointerException` for some of the 
cases.
 * Why the `RebalanceMessageStatistics` is needed? I don't think that holding 
`sndMsgTime` for each message will be useful for rebalancing statistic at all. 
The same thing for `rcvMsgTime`.
 * I think `ReceivePartitionStatistics`.`msgSize` will be the same for 98% 
cases. Do we need it?
 * Do we need `PartitionStatistics` at all? Can the same value be obtained from 
metrics `onRebalanceKeyReceived` and the end of the rebalance procedure?

Please, do not merge PR until all the issues will be resolved.

> Add extended logging for rebalance
> ----------------------------------
>
>                 Key: IGNITE-12080
>                 URL: https://issues.apache.org/jira/browse/IGNITE-12080
>             Project: Ignite
>          Issue Type: Improvement
>            Reporter: Kirill Tkalenko
>            Assignee: Kirill Tkalenko
>            Priority: Major
>          Time Spent: 50m
>  Remaining Estimate: 0h
>
> We should log all information about finished rebalance on demander node.
> I'd have in log:
> h3. Total information:
> # Rebalance duration, rebalance start time/rebalance finish time
> # How many partitions were processed in each topic (number of paritions, 
> number of entries, number of bytes)
> # How many nodes were suppliers in rebalance (nodeId, number of supplied 
> paritions, number of supplied entries, number of bytes, duraton of getting 
> and processing partitions from supplier)
> h3. Information per cache group:
> # Rebalance duration, rebalance start time/rebalance finish time
> # How many partitions were processed in each topic (number of paritions, 
> number of entries, number of bytes)
> # How many nodes were suppliers in rebalance (nodeId, number of supplied 
> paritions, list of partition ids with PRIMARY/BACKUP flag, number of supplied 
> entries, number of bytes, duraton of getting and processing partitions from 
> supplier)
> # Information about each partition distribution (list of nodeIds with 
> primary/backup flag and marked supplier nodeId)
> h3. Information per supplier node:
> # How many paritions were requested: 
> #* Total number
> #* Primary/backup distribution (number of primary partitions, number of 
> backup partitions)
> #* Total number of entries
> #* Total size partitions in bytes
> # How many paritions were requested per cache group:
> #* Number of requested partitions
> #* Number of entries in partitions
> #* Total size of partitions in bytes
> #* List of requested partitions with size in bytes, count entries, primary or 
> backup partition flag



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

Reply via email to