[ 
https://issues.apache.org/jira/browse/KUDU-3016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Serbin updated KUDU-3016:
--------------------------------
    Description: 
With current structure of the system tablet for rows storing metadata 
information on tablets, the catalog manager can create a very large write 
operation on the system tablet when processing full tablet reports sent from 
tablet servers.  At some point (depends on the {{\-\-rpc_max_message_size}} 
setting), a tablet report received from a tablet server comes through, but its 
Raft counterpart for the system tablet update doesn't because it might be 
almost two times larger.  If that happens, Kudu cluster becomes almost 
non-functional because of self-perpetuating 
accepted-huge-tablet-report-but-cannot-push-Raft-update-to-follower-masters 
pattern.

The catalog manager should not lump together updates on all tablets received 
from one tablet server:  
https://github.com/apache/kudu/blob/3175c35c7d721aef0c4c6b358cc3b422089c1ba7/src/kudu/master/catalog_manager.cc#L4268-L4274


  was:
With current structure of system tablet for rows storing metadata information 
on tablets, the catalog manager can create a very large write operation on the 
system tablet when processing full tablet reports sent from tablet servers.  At 
some point (depends on the {{\-\-rpc_max_message_size}} setting), a tablet 
report received from a tablet server comes through, but its Raft counterpart 
for the update can be almost two times larger.  If that happens, Kudu cluster 
becomes almost non-functional because of self-perpetuating 
accepted-huge-tablet-report-but-cannot-push-Raft-update-to-follower-masters 
pattern.

The catalog manager should not lump together updates on all tablets received 
from one tablet server:  
https://github.com/apache/kudu/blob/3175c35c7d721aef0c4c6b358cc3b422089c1ba7/src/kudu/master/catalog_manager.cc#L4268-L4274



> Catalog manager: don't lump together all updates from one tablet report
> -----------------------------------------------------------------------
>
>                 Key: KUDU-3016
>                 URL: https://issues.apache.org/jira/browse/KUDU-3016
>             Project: Kudu
>          Issue Type: Improvement
>          Components: master
>            Reporter: Alexey Serbin
>            Assignee: Alexey Serbin
>            Priority: Major
>              Labels: scalability
>
> With current structure of the system tablet for rows storing metadata 
> information on tablets, the catalog manager can create a very large write 
> operation on the system tablet when processing full tablet reports sent from 
> tablet servers.  At some point (depends on the {{\-\-rpc_max_message_size}} 
> setting), a tablet report received from a tablet server comes through, but 
> its Raft counterpart for the system tablet update doesn't because it might be 
> almost two times larger.  If that happens, Kudu cluster becomes almost 
> non-functional because of self-perpetuating 
> accepted-huge-tablet-report-but-cannot-push-Raft-update-to-follower-masters 
> pattern.
> The catalog manager should not lump together updates on all tablets received 
> from one tablet server:  
> https://github.com/apache/kudu/blob/3175c35c7d721aef0c4c6b358cc3b422089c1ba7/src/kudu/master/catalog_manager.cc#L4268-L4274



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to