#general


@noahprince8: Every now and then, after several restarts of brokers, controllers, servers, my local Pinot gets into a bad state where it shows number of segments 0/83. VerifySegmentState gives ```Segment: table_OFFLINE_1603953900214_1603953902314_7 idealstate: {Server_10.136.245.18_8003=ONLINE} is MISSING in external view: Segment: table_OFFLINE_1603953900214_1603953902314_7 idealstate: {Server_10.136.245.18_8003=ONLINE} does NOT match external view: null table_OFFLINE = ERROR``` Helix doesn’t seem to be assigning segments to my server. So the whole thing is just broken. Scorched earth (nuking ZK, reloading all the segments, etc) works, but if this were procuction how do I get things to work again? How do you debug something like this?
  @noahprince8: I think maybe it has something to do with rebooting a server with the same name: ```2020/11/08 11:16:08.472 WARN [ParticipantManager] [main] found another instance with same instanceName: Server_10.136.245.13_8003 in cluster quickstart```
  @noahprince8: Looking at a particular segment
  @noahprince8: It should be able to recover given all of the segments are in deep store.
  @fx19880617: is your server up address changing ? Sometimes it may cause the Pinot start with different instance name
  @noahprince8: It’s just running in intellij after some restarts
  @noahprince8: I think the IP is changing occasionally, though.
  @noahprince8: I imagine a similar thing would happen in k8s
  @noahprince8: Moving the discussion to troubleshooting.

#troubleshooting


@noahprince8: @noahprince8 has joined the channel
@noahprince8: Thread for this
  @noahprince8: Sorry. Didn’t know this channel existed
  @noahprince8: It is in the ideal state: ``` "mapFields": { "table_OFFLINE_1603929600370_1603929899494_33": { "Server_10.136.245.18_8003": "ONLINE" },``` Just doesn’t seem helix is doing anything to reach that state?
  @mayanks: Do you see any messages pending for instance
  @mayanks: Or anything in the logs
  @noahprince8: Where would you see pending messages?
  @noahprince8: Nothing particularly interesting in server, broker, or controller logs
  @mayanks: You can find messages/errors/etc in ZK under `INSTANCES`
  @mayanks: I recall a helix bug fix that we pulled in PR-6166
  @ssubrama: Can you check your server instance state in helix? It should be in zookeeper under `INSTANCES/Server_10.136.245.18_8003` Under this folder, the CURRENTSTATES folder should have a sessionid dir and a table dir under neath that. If this is not available, then at the same level you can look for ERRORS and see what you find there. If all these are greyed out, then the server 10.136.245.18 is not up, You can check under LIVEINSTANCES as to which servers are up
  @noahprince8: Can see the server but it has 0/83 segments. In zookeeper everything is empty except history for that server. ```{ "id": "Server_10.136.245.13_8003", "simpleFields": { "LAST_OFFLINE_TIME": "-1" }, "mapFields": {}, "listFields": { "HISTORY": [ "{DATE=2020-11-08T17:07:16:698, VERSION=0.9.8, SESSION=1006cf417e10022, TIME=1604855236698}", "{DATE=2020-11-08T17:16:43:638, VERSION=0.9.8, SESSION=1006cf417e1002b, TIME=1604855803638}", "{DATE=2020-11-09T00:38:17:866, VERSION=0.9.8, SESSION=1006cf417e10034, TIME=1604882297866}", "{DATE=2020-11-09T00:41:42:746, VERSION=0.9.8, SESSION=1006cf417e10035, TIME=1604882502746}" ], "OFFLINE": [ "2020-11-08T17:07:17:345", "2020-11-09T00:38:16:619", "2020-11-09T00:38:17:969" ] } }```

#pinot-0-5-0-release


@sharadc2001: I have some other service which is running there
@sharadc2001: I am not able to find which
@sharadc2001: one is using this port. In netstat i could see 8080 port being used
--------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]

Reply via email to