After more testing, It seems that restarting the service should be sufficient, you just need to be patient and wait. If the cluster (master specifically?) isn't too busy that joining times out (which is configurable in elasticsearch.yml), eventually it'll join. In light of this, I'm going to generally modify the timeout for a much, much longer than default value unless someone can describe a downside. And, I am curious whether the new candidate node needs to connect specifically to a Master instead of just any node in the cluster... The docs and descriptions I've read so far only describe contacting the cluster generally. Am also curious (short of packet sniffing) if in the act of joining the candidate node repeatedly sends requests to join at what interval (is it close to a broadcast storm or very pedestrian or maybe only once?) Tony
On Tuesday, February 4, 2014 4:22:58 PM UTC-8, Tony Su wrote: > Hi Mark, > I've done all that to no effect. > > FYI if it makes a diff, > I'm running on a distro that uses systemd, so in theory when the Service > is started, it's supposed to create a cgroup in which the new process is > run, and if there are any processes that are spawned (including but not > limited to new ES processes), they're all supposed to be managed by that > cgroup. This generally means that compared to SystemV when the cgroup is > shutdown, it shuts down all child processes reliably, there are no orphaned > processes that continue to run. > > So, when I stop the ES service, it really should be shutdown. > But, when I start up again I've waited over 5 minutes on a small but > active cluster accepting new data and the node never joins. > But, after rebooting the orphaned node, and starting the ES service it > rarely takes more than about 15 seconds to join (according to ES-head). > > Tony > > > On Tuesday, February 4, 2014 2:10:14 PM UTC-8, Mark Walkom wrote: > >> If you give the service a restart, it's a stop and then a start >> (obviously). >> This will/should reread the config and attempt to rejoin the cluster in >> the config. >> >> Can you try an explicit stop, then sleep for 5, then start? It could be >> the process isn't properly closing when requested. >> >> Regards, >> Mark Walkom >> >> Infrastructure Engineer >> Campaign Monitor >> email: [email protected] >> web: www.campaignmonitor.com >> >> >> On 5 February 2014 04:22, Tony Su <[email protected]> wrote: >> >>> Unless I'm missing something in the docs or these forums, >>> >>> I've surprisingly found that if a node fails to join the cluster, it's >>> not sufficient to simply restart ES on the machine. I would have thought >>> that restarting ES thereby re-reading its config files should be sufficient >>> to announce its intention to join the cluster. >>> >>> But, I haven't found that to be the case, every time I've had to reboot >>> the entire machine to join the cluster. >>> >>> Is there a config I'm missing? >>> >>> Thx, >>> Tony >>> >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "elasticsearch" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to [email protected]. >>> To view this discussion on the web visit >>> https://groups.google.com/d/msgid/elasticsearch/02c4b578-f430-44ba-a98c-7337b684125d%40googlegroups.com >>> . >>> For more options, visit https://groups.google.com/groups/opt_out. >>> >> >> -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/7b690bde-71ff-415f-994e-2031662e522c%40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
