Say you restarted all instances in the cluster and status for some host goes missing. Now when you start a host replacement, the new host won’t learn about the host whose status is missing and the view of this host will be wrong.
PS: I will be happy to be proved wrong as I can also start using Gossip snitch :) > On Oct 19, 2018, at 2:41 PM, Jeremy Hanna <jeremy.hanna1...@gmail.com> wrote: > > Do you mean to say that during host replacement there may be a time when the > old->new host isn’t fully propagated and therefore wouldn’t yet be in all > system tables? > >> On Oct 17, 2018, at 4:20 PM, sankalp kohli <kohlisank...@gmail.com> wrote: >> >> This is not the case during host replacement correct? >> >> On Tue, Oct 16, 2018 at 10:04 AM Jeremiah D Jordan < >> jeremiah.jor...@gmail.com> wrote: >> >>> As long as we are correctly storing such things in the system tables and >>> reading them out of the system tables when we do not have the information >>> from gossip yet, it should not be a problem. (As far as I know GPFS does >>> this, but I have not done extensive code diving or testing to make sure all >>> edge cases are covered there) >>> >>> -Jeremiah >>> >>>> On Oct 16, 2018, at 11:56 AM, sankalp kohli <kohlisank...@gmail.com> >>> wrote: >>>> >>>> Will GossipingPropertyFileSnitch not be vulnerable to Gossip bugs where >>> we >>>> lose hostId or some other fields when we restart C* for large >>>> clusters(~1000 instances)? >>>> >>>>> On Tue, Oct 16, 2018 at 7:59 AM Jeff Jirsa <jji...@gmail.com> wrote: >>>>> >>>>> We should, but the 4.0 features that log/reject verbs to invalid >>> replicas >>>>> solves a lot of the concerns here >>>>> >>>>> -- >>>>> Jeff Jirsa >>>>> >>>>> >>>>>> On Oct 16, 2018, at 4:10 PM, Jeremy Hanna <jeremy.hanna1...@gmail.com> >>>>> wrote: >>>>>> >>>>>> We have had PropertyFileSnitch for a long time even though >>>>> GossipingPropertyFileSnitch is effectively a superset of what it offers >>> and >>>>> is much less error prone. There are some unexpected behaviors when >>> things >>>>> aren’t configured correctly with PFS. For example, if you replace >>> nodes in >>>>> one DC and add those nodes to that DCs property files and not the other >>> DCs >>>>> property files - the resulting problems aren’t very straightforward to >>>>> troubleshoot. >>>>>> >>>>>> We could try to improve the resilience and fail fast error checking and >>>>> error reporting of PFS, but honestly, why wouldn’t we deprecate and >>> remove >>>>> PropertyFileSnitch? Are there reasons why GPFS wouldn’t be sufficient >>> to >>>>> replace it? >>>>>> --------------------------------------------------------------------- >>>>>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org >>>>>> For additional commands, e-mail: dev-h...@cassandra.apache.org >>>>>> >>>>> >>>>> --------------------------------------------------------------------- >>>>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org >>>>> For additional commands, e-mail: dev-h...@cassandra.apache.org >>>>> >>>>> >>> >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org >>> For additional commands, e-mail: dev-h...@cassandra.apache.org >>> >>> > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org > For additional commands, e-mail: dev-h...@cassandra.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org For additional commands, e-mail: dev-h...@cassandra.apache.org