[ 
https://issues.apache.org/jira/browse/IGNITE-12935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17095690#comment-17095690
 ] 

Ivan Rakov commented on IGNITE-12935:
-------------------------------------

My comments:
1, 2 - agree
regarding NOT_RESERVED_WAL_REASON - I'd keep it to track cases when 
CheckpointHistory provide entries that aren't present in WAL anymore (shouldn't 
happen, but just in case)
3 - we log best partition only in case we didn't succeed in finding any 
partition suitable for WAL rebalance. Logging other partition is redundant: 
their history is even more shallow.
4 - agree
5 - totally agree, this is crucial. Forgot about it during my review
6 - agree
7 - arguable. Main purpose of these logs is investigation of full rebalance 
reasons post factum. Anyway, lots of messages about lots of partition will 
occur only on PME when some partitions need to be rebalanced, which shouldn't 
happed very often.

> Disadvantages in log of historical rebalance
> --------------------------------------------
>
>                 Key: IGNITE-12935
>                 URL: https://issues.apache.org/jira/browse/IGNITE-12935
>             Project: Ignite
>          Issue Type: Improvement
>            Reporter: Vladislav Pyatkov
>            Assignee: Vladislav Pyatkov
>            Priority: Major
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> # Mention in the log only partitions for which there are no nodes that suit 
> as historical supplier
>  For these partitions, print minimal counter (since which we should perform 
> historical rebalancing) with corresponding node and maximum reserved counter 
> (since which cluster can perform historical rebalancing) with corresponding 
> node.
>  This will let us know:
>  ## Whether history was reserved at all
>  ## How much reserved history we lack to perform a historical rebalancing
>  ## I see resulting output like this:
> {noformat}
>  Historical rebalancing wasn't scheduled for some partitions:
>  History wasn't reserved for: [list of partitions and groups]
>  History was reserved, but minimum present counter is less than maximum 
> reserved: [[grp=GRP, part=ID, minCntr=cntr, minNodeId=ID, maxReserved=cntr, 
> maxReservedNodeId=ID], ...]{noformat}
>  ## We can also aggregate previous message by (minNodeId) to easily find the 
> exact node (or nodes) which were the reason of full rebalance.
>  # Log results of {{reserveHistoryForExchange()}}. They can be compactly 
> represented as mappings: {{(grpId -> checkpoint (id, timestamp))}}. For every 
> group, also log message about why the previous checkpoint wasn't successfully 
> reserved.
>  There can be three reasons:
>  ## Previous checkpoint simply isn't present in the history (the oldest is 
> reserved)
>  ## WAL reservation failure (call below returned false)
> {code:java}
> chpEntry = entry(cpTs);
> boolean reserved = cctx.wal().reserve(chpEntry.checkpointMark());// If 
> checkpoint WAL history can't be reserved, stop searching. 
> if (!reserved) 
>   break;{code}
> ## Checkpoint was marked as inapplicable for historical rebalancing
> {code:java}
> for (Integer grpId : new HashSet<>(groupsAndPartitions.keySet()))
>    if (!isCheckpointApplicableForGroup(grpId, chpEntry))
>      groupsAndPartitions.remove(grpId);{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to