Apache Pinot Daily Email Digest (2020-10-16)

Pinot Slack Email Digest Fri, 16 Oct 2020 19:00:47 -0700

#general

@chundong.wang: @chundong.wang has joined the channel
@anshu.jalan: @anshu.jalan has joined the channel
@snlee727: @snlee727 has joined the channel
@jiapengtao0: @jiapengtao0 has joined the channel

#random

#troubleshooting

@snlee727: @snlee727 has joined the channel

#pinot-dev

@chundong.wang: @chundong.wang has joined the channel

#metadata-push-api

@snlee727: @snlee727 has joined the channel

#pinot-realtime-table-rebalance

@tingchen: @tingchen has joined the channel
@tingchen: @tingchen set the channel purpose: Discussion about Pinot table rebalance
@yupeng: @yupeng has joined the channel
@ujwala.tulshigiri: @ujwala.tulshigiri has joined the channel
@jackie.jxt: @jackie.jxt has joined the channel
@npawar: @npawar has joined the channel
@ssubrama: @ssubrama has joined the channel
@tingchen: Created this channel to discuss about Pinot rebalance issues we faced earlier in Uber.
@tingchen: We have a heavily used Pinot tenant which had 6 servers to begin with.
@tingchen: I added 3 new servers with identical spec to that tenant. The problem is that the 3 new servers showed twice as much as CPU load compared with the rest 6.
@tingchen: The 3 new servers also have 50% more documents than the rest 6. So it seems that table rebalance does not distribute the data evenly among all servers?
@tingchen: Another related qns: is Replica Group availabe for realtime LLC query routing ?
@tingchen: the examples above use OFFLINE table. @jackie.jxt @npawar
@npawar: replica groups for realtime:
@npawar: it does distribute evenly. but it is possible that some servers have 1 more consuming partition than the others
@npawar: does the ideal state show a bigger imbalance?
@tingchen: yes
@tingchen: tingchen@streampinot-prod40-dca8:~$ upinot-admin.sh ShowIdealState storeindex_search_history |grep CONSUMING "Server_streampinot-prod04-dca8_7090": "CONSUMING", "Server_streampinot-prod05-dca8_7090": "CONSUMING", "Server_streampinot-prod06-dca8_7090": "CONSUMING" "Server_streampinot-prod164-dca8_7090": "CONSUMING", "Server_streampinot-prod165-dca8_7090": "CONSUMING", "Server_streampinot-prod166-dca8_7090": "CONSUMING" "Server_streampinot-prod167-dca8_7090": "CONSUMING", "Server_streampinot-prod168-dca8_7090": "CONSUMING", "Server_streampinot-prod169-dca8_7090": "CONSUMING" "Server_streampinot-prod04-dca8_7090": "CONSUMING", "Server_streampinot-prod05-dca8_7090": "CONSUMING", "Server_streampinot-prod06-dca8_7090": "CONSUMING"
@npawar: this looks fine and balanced rt?
@jackie.jxt: Seems the problem is that there are 12 kafka partitions (streaming partition * replication), but only 9 servers
@jackie.jxt: So 3 servers will have 2 partitions to consume, the other 6 has 1 partition
@tingchen: yes. so the 9 servers' perf is in fact worse than 6 servers.
@yupeng: no. this topic has 4 partitions
@yupeng: but 3 repication factor
@tingchen: I also observed the document distribution is not even -- the latter 3 servers has 50% more doc than the original 6.
@tingchen: is that expected?
@jackie.jxt: The difference between LLC and realtime is that all the segments for one partition will be hosted on the same server, so think of partition as the smallest unit of the table
@npawar: wont the completed segments get distributed evenly though? if you run rebalance
@jackie.jxt: No, unless you configure the COMPLETED segment assignment
@tingchen: what do you mean by the diff between LLC and realtime? I thought LLC is realtime?
@jackie.jxt: Sorry, LLC and offline
@jackie.jxt: Performance wise, 9 servers should be similar to 6 servers because the server load on the 3 new servers are the same as before
@jackie.jxt: Do you use partitioning or replica-group routing for this table?
@yupeng: we use default
@tingchen: we want to use replica-group routing for this tenant (right now it has 12 servers)
@tingchen: otherwise each query got fanned out to all 12 now -- which could depend on the slowest server.
@jackie.jxt: For LLC table, because of the nature of the streaming partition, the segments are already assigned into replica-groups
@tingchen: ```{ "tableName": "pinotTable", "tableType": "REALTIME", "routing": { "instanceSelectorType": "replicaGroup" } .. }```
@jackie.jxt: Simple enable replica-group routing should do the work
@jackie.jxt: Yes, correct
@tingchen: so we just need to add the above and restart broker?
@jackie.jxt: Let me check, I think we have an API to avoid restarting broker
@jackie.jxt: You can use the broker rebuild routing API to enable it: ```@PUT @Produces(MediaType.TEXT_PLAIN) @Path("/routing/{tableName}")```
@jackie.jxt: (table name here is the full table name, e.g. `pinotTable_REALTIME`)
@npawar: though this won’t solve you issue of some servers seeing twice the load. 1 server in each replica group is still going to have the same behavior
@jackie.jxt: I think they already scaled up the cluster to 12 servers
@npawar: o okay
@tingchen: yes.
@tingchen: looks like 6-9 was not a good idea. 6->12 is.
@yupeng: Thanks @npawar @jackie.jxt for the help
--------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]

Apache Pinot Daily Email Digest (2020-10-16)

#general

#random

#troubleshooting

#pinot-dev

#metadata-push-api

#pinot-realtime-table-rebalance

Reply via email to