[
https://issues.apache.org/jira/browse/KUDU-411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Mike Percy updated KUDU-411:
----------------------------
Parent: KUDU-410
> Move the consensus RPCs out of tserver_service
> ----------------------------------------------
>
> Key: KUDU-411
> URL: https://issues.apache.org/jira/browse/KUDU-411
> Project: Kudu
> Issue Type: Sub-task
> Components: tablet
> Affects Versions: M4.5
> Reporter: Jean-Daniel Cryans
> Assignee: Jean-Daniel Cryans
>
> I tried creating a 150 tablets table with consensus set to 3 replicas and I
> see the following:
> {noformat}
> I0729 11:37:58.501868 13007 raft_consensus.cc:168] T
> 3d57b78c37464f2091a4be3d05383653 P 8039032e27344e1d90a3db4b7a124781 [LEADER]:
> ChangeConfiguration(op=term: 0 index: 1, seqno=0): replicating to peers...
> W0729 11:37:58.513223 13000 consensus_peers.cc:390] Couldn't send request to
> peer 44696db5501b4ee0b6b233c90e2d8f88 for tablet
> 3d57b78c37464f2091a4be3d05383653: Status: Remote error: Service unavailable:
> UpdateConsensus request on kudu.tserver.TabletServerService from
> 10.20.188.124:52904 dropped due to backpressure. The service queue is full;
> it has 50 items.. Retrying in the next heartbeat period. Already tried 1
> times.
> W0729 11:37:59.502914 13000 consensus_peers.cc:390] Couldn't send request to
> peer a64e2f0c055d4d62a6cb88bdf5ec03c3 for tablet
> 3d57b78c37464f2091a4be3d05383653: Status: Timed out: Call timed out. Retrying
> in the next heartbeat period. Already tried 1 times.
> W0729 11:38:00.002295 13000 consensus_peers.cc:390] Couldn't send request to
> peer 44696db5501b4ee0b6b233c90e2d8f88 for tablet
> 3d57b78c37464f2091a4be3d05383653: Status: Timed out: Call timed out. Retrying
> in the next heartbeat period. Already tried 2 times.
> W0729 11:38:00.003197 13000 consensus_peers.cc:390] Couldn't send request to
> peer a64e2f0c055d4d62a6cb88bdf5ec03c3 for tablet
> 3d57b78c37464f2091a4be3d05383653: Status: Remote error: Service unavailable:
> UpdateConsensus request on kudu.tserver.TabletServerService from
> 10.20.188.124:42067 dropped due to backpressure. The service queue is full;
> it has 50 items.. Retrying in the next heartbeat period. Already tried 2
> times.
> ...
> W0729 11:50:35.821658 13000 consensus_peers.cc:390] Couldn't send request to
> peer a64e2f0c055d4d62a6cb88bdf5ec03c3 for tablet
> 3d57b78c37464f2091a4be3d05383653: Status: Remote error: Service unavailable:
> UpdateConsensus request on kudu.tserver.TabletServerService from
> 10.20.188.124:42067 dropped due to backpressure. The service queue is full;
> it has 50 items.. Retrying in the next heartbeat period. Already tried 1513
> times.
> W0729 11:50:35.821877 13000 consensus_peers.cc:390] Couldn't send request to
> peer 44696db5501b4ee0b6b233c90e2d8f88 for tablet
> 3d57b78c37464f2091a4be3d05383653: Status: Remote error: Service unavailable:
> UpdateConsensus request on kudu.tserver.TabletServerService from
> 10.20.188.124:52904 dropped due to backpressure. The service queue is full;
> it has 50 items.. Retrying in the next heartbeat period. Already tried 1513
> times.
> {noformat}
> All the TS are trying to talk to each other and everything's full. At the
> same time, the master starts creating new tablets because these ones aren't
> being created within the given timeout. It's also interesting to see that the
> tablet is still alive after 13 minutes when it was deleted at minute after
> its creation:
> {noformat}
> W0729 11:39:00.035020 7432 catalog_manager.cc:1573] Tablet
> 3d57b78c37464f2091a4be3d05383653 (table lineitem
> [id=5b1b20d435674108930d47b1061cdf7d]) was not created within the allowed
> timeout. Replacing with a new tablet 0fc88a3798e64433bf76e8f9689617cc
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)