Can you list what the contention points are according to you? Feel free to do that in the PR as well, I want to make sure we have the same understanding of the points that still need to be resolved. From where I stand, there is no major issue pending other than one polishing issue that I brought upon in the PR.
-Flavio > On 8 May 2018, at 21:08, Andor Molnar <an...@cloudera.com> wrote: > > I'm happy to do that once we have an agreement. > > > > > On Tue, May 8, 2018 at 8:34 AM, Flavio Junqueira <f...@apache.org> wrote: > >> It might be a good idea to document whatever we end up doing. >> >> -Flavio >> >>> On 8 May 2018, at 17:22, Andor Molnar <an...@cloudera.com> wrote: >>> >>> "If refactoring is necessary to address the issue, then it should be part >>> of the PR." >>> >>> Agreed. I think it is and it also makes things a lot more simpler, so >> it's >>> easier to review. >>> >>> "I'm not sure what kind of confirmation you are after here. Could you >>> elaborate, please?" >>> >>> I'm just wondering what could have been the reason for caching host >>> addresses in StaticHostProvider in the first place. >>> >>> "The other solution, if I remember enough of it, tried to avoid resolving >>> unnecessarily, but perhaps the DNS client cache is enough..." >>> >>> That's exactly my point: what JDK is doing internally is perfectly fine >> for >>> us and we don't need extra logic here. >>> >>> "Could you elaborate on this point, please? What exactly do you mean with >>> this statement?" >>> >>> See my previous point. Caching is already done in all DNS servers in the >>> chain and also there's also caching in JDK already, which means by >> calling >>> getByName() you don't necessarily fire a DNS request, only when the TTL >> is >>> expired. >>> >>> Andor >>> >>> >>> >>> >>> On Tue, May 8, 2018 at 8:12 AM, Flavio Junqueira <f...@apache.org> wrote: >>> >>>> Hi Andor, >>>> >>>> Thanks for your work on addressing the issue. >>>> >>>>> On 8 May 2018, at 16:06, Andor Molnar <an...@cloudera.com> wrote: >>>>> >>>>> Hi, >>>>> >>>>> Updating this thread, because the PR is still being review on GitHub. >>>>> >>>>> So, the reason why I refactored the original behaviour of >>>>> StaticHostProvider is that I believe that it's trying to do something >>>> which >>>>> is not its responsibility. >>>> >>>> If refactoring is necessary to address the issue, then it should be part >>>> of the PR. Otherwise, it is better to refactor in a separate PR. >>>> >>>> >>>>> Please tell me if there's a good historical >>>>> reason for that. >>>> >>>> I'm not sure what kind of confirmation you are after here. Could you >>>> elaborate, please? >>>> >>>>> >>>>> My approach is giving the user the following to options: >>>>> 1- Use static IP addresses, if you don't want to deal with DNS >> resolution >>>>> at all - we guarantee that no DNS logic will involved in this case at >>>> all. >>>> >>>> Sounds fine. >>>> >>>>> 2- Use DNS hostnames if you have a reliable DNS service for resolution >>>>> (with HA, secondary servers, backups, etc.) - we must use DNS in the >>>> right >>>>> way in this case e.g. do NOT cache IP address for a longer period that >>>> DNS >>>>> server allows to and re-resolve after TTL expries, because it's >> mandatory >>>>> by protocol. >>>> >>>> Sounds fine. >>>> >>>>> >>>>> My 2 cents here: >>>>> - the fix which was originally posted for re-resolution is a workaround >>>> and >>>>> doesn't satisfy the requirement for #2, >>>> >>>> The other solution, if I remember enough of it, tried to avoid resolving >>>> unnecessarily, but perhaps the DNS client cache is enough to avoid the >>>> unnecessary round-trips. This observation actually brings me to the next >>>> point: >>>> >>>>> - the solution is already built-in in JDK and DNS clients in the right >>>> way >>>> >>>> Could you elaborate on this point, please? What exactly do you mean with >>>> this statement? >>>> >>>>> - can't see a reason why we shouldn't use that >>>>> >>>>> I checked this in some other projects as well and found very similar >>>>> approach in hadoop-common's SecurityUtil.java. It has 2 built-in >> plugins >>>>> for that: >>>>> - Standard resolver uses java's built-in getByName(). >>>>> - Qualified resolver still uses getByName(), but adds some logic to >> avoid >>>>> incorrect re-resolutions and reverse IP lookups. >>>>> >>>>> Please let me know your thoughts. >>>>> >>>>> Regards, >>>>> Andor >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> On Tue, Mar 6, 2018 at 8:12 AM, Andor Molnar <an...@cloudera.com> >> wrote: >>>>> >>>>>> Hi Abe, >>>>>> >>>>>> Unfortunately we haven't got any feedback yet. What do you think of >>>>>> implementing Option #3? >>>>>> >>>>>> Regards, >>>>>> Andor >>>>>> >>>>>> >>>>>> On Thu, Feb 22, 2018 at 6:06 PM, Andor Molnar <an...@cloudera.com> >>>> wrote: >>>>>> >>>>>>> Did anybody happen to take a quick look by any chance? >>>>>>> >>>>>>> I don't want to push this too hard, because I know it's a time >>>> consuming >>>>>>> topic to think about, but this is a blocker in 3.5 which has been >>>> hanging >>>>>>> around for a while and any feedback would be extremely helpful to >>>> close it >>>>>>> quickly. >>>>>>> >>>>>>> Thanks, >>>>>>> Andor >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Mon, Feb 19, 2018 at 12:18 PM, Andor Molnar <an...@cloudera.com> >>>>>>> wrote: >>>>>>> >>>>>>>> Hi all, >>>>>>>> >>>>>>>> We need more eyes and brains on the following PR: >>>>>>>> >>>>>>>> https://github.com/apache/zookeeper/pull/451 >>>>>>>> >>>>>>>> I added a comment few days ago about the way we currently do DNS >> name >>>>>>>> resolution in this class and a suggestion on how we could simplify >>>> things a >>>>>>>> little bit. We talked about it with Abe Fine, but we're a little bit >>>> unsure >>>>>>>> and cannot get a conclusion. It would be extremely handy to get more >>>>>>>> feedback from you. >>>>>>>> >>>>>>>> To add some colour to it, let me elaborate on the situation here: >>>>>>>> >>>>>>>> In general, the task that StaticHostProvider does is to get a list >> of >>>>>>>> potentially unresolved InetSocketAddress objects, resolve them and >>>> iterate >>>>>>>> over the resolved objects by calling next() method. >>>>>>>> >>>>>>>> *Option #1 (current logic)* >>>>>>>> - Resolve addresses with getAllByName() which returns a list of IP >>>>>>>> addresses associated with the address. >>>>>>>> - Cache all these IP's, shuffle them and iterate over. >>>>>>>> - If client is unable to connect to an IP, remove all IPs from the >>>> list >>>>>>>> which the original servername was resolved to and re-resolve it. >>>>>>>> >>>>>>>> *Option #2 (getByName())* >>>>>>>> - Resolve address with getByName() instead which returns only the >>>> first >>>>>>>> IP address of the name, >>>>>>>> - Do not cache IPs, >>>>>>>> - Shuffle the *names* and resolve with getByName() *every time* when >>>>>>>> next() is called, >>>>>>>> - JDK's built-in caching will prevent name servers from being >> flooded >>>>>>>> and will do the re-resolution automatically when cache expires, >>>>>>>> - Names with multiple IPs will be handled by DNS servers which (if >>>>>>>> configured properly) return IPs in different order - this is called >>>> DNS >>>>>>>> Round Robin -, so getByName() will return different IP on each call. >>>>>>>> >>>>>>>> *Options #3* >>>>>>>> - There's a small problem with option#2: if DNS server is not >>>> configured >>>>>>>> properly and handles the round-robin case in a way that it always >>>> return >>>>>>>> the IP list in the same order, getByName() will never return the >> next >>>> ip, >>>>>>>> - In order to overcome that, use getAllByName() instead, shuffle the >>>>>>>> list and return the first IP. >>>>>>>> >>>>>>>> All feedback if much appreciated. >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Andor >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>> >>>> >> >>